A dataset containing the the first and second formants, speech rate, gender, and year of birth for 100 random speakers from the ONZE corpus. 50 speakers are sampled with birth years before 1900 and 50 sampled with birth years on or after 1900 to ensure a full span of the time period. Data is present for the following NZE monophthongs, represented by Wells lexical sets: DRESS, FLEECE, GOOSE, KIT, LOT, NURSE, START, STRUT, THOUGHT, TRAP. Data for FOOT is excluded due to low token counts.
Format
A dataframe with 101572 rows and 8 variables:
- speaker
Anonymised speaker code (factor).
- vowel
Variable with Wells lexical sets for 10 NZE monophthongs. Levels: DRESS, FLEECE, GOOSE, KIT, LOT, NURSE, START, STRUT, THOUGHT, TRAP (factor).
- F1_50
First formant, extracted from vowel mid-point using LaBB-CAT interface with Praat.
- F2_50
Second formant, extracted from vowel mid-point using LaBB-CAT interface with Praat.
- speech_rate
Average speaker speech rate for whole recording.
- gender
Gender of speaker, two levels: "M", "F" (factor).
- yob
Year of birth of speaker.
- word
Anonymised word code (factor).