Datasets¶

A dataset is a tabular data file that can be opened, explored, and analyzed within Phonometrica. Datasets are typically CSV files that have been imported into the project, but they can also be created from concordance results. The dataset view provides tools for inspecting, filtering, transforming, and preparing your data for statistical analysis.

Datasets are stored as plain .csv (or .tsv) files on disk. They are listed in the project tree under a Datasets folder, and any metadata associated with a dataset (column aliases, filter rules, properties, etc.) is kept in the project file alongside the dataset.

Opening a dataset¶

To import a CSV file as a dataset, use File > Add files to project… and select your CSV file. When you add a CSV file, Phonometrica will ask you to specify the column separator (tab, comma, or semicolon) and will automatically detect column types (numeric vs. text).

You can also create a dataset from a concordance view by exporting it to CSV and reimporting, or by using the scripting engine.

To open a dataset, double-click on it in the file manager or right-click and choose View file.

Structure of the dataset view¶

Filtering¶

The dataset view provides the same filtering capabilities as the concordance view:

Filter: show or hide the filter bar.
Clear filters: remove all filter rules.
Subset: create a new dataset from the visible (filtered) rows.
Metric column: compute a distance metric (z-score, modified z-score, etc.) on a numeric column for outlier detection.
Normalize vowels: apply vowel normalization to formant columns (see Vowel normalization).

For details on filter rules, see Concordances.

Analyze¶

The Analyze button opens an analysis view for the dataset, where you can fit statistical models to your data.

Set operations¶

Like concordances, datasets support set operations on rows:

Union (A ∪ B): combine two datasets.
Intersection (A ∩ B): keep only matching rows.
Complement (A ∖ B): keep rows that do not appear in the other dataset.
Merge: horizontal merge — add columns from another dataset, matching by row position.

Column operations¶

Right-click on a column header to access column-level operations:

Sort ascending / Sort descending: sort the dataset by this column.
Rename: give the column a new name.
Recode… (text columns): create a new column with remapped category labels.
Transform… (numeric columns): create a new column by applying a mathematical formula (see Column transformations).
Duplicate: create a copy of the column.
Move left / Move right: change the column’s position.
Delete: remove the column.

Tips¶

Use Filter + Subset to extract subsets of your data based on specific criteria before running analyses.
Use Recode to simplify categorical variables (e.g. merging several phonetic variants into a single phonological category).
Use Transform to convert acoustic values to perceptual scales (e.g. bark(x), erb(x), st(x)).
Use Metric column followed by filtering to identify and exclude outliers before modeling.
Click Analyze to open the analysis view directly from your prepared dataset.

Datasets¶

Opening a dataset¶