search-labels-alignments-praat-script.Rmd
LaBB-CAT is a browser-based linguistic annotation store that stores audio or video recordings, text transcripts, and other annotations. The nzilbb.labbcat R package provides access to linguistic data stored in LaBB-CAT servers, allowing tokens and their annotations to be identified and extracted, along with media data, and acoustic measurements.
This worked example shows how to:
In particular, we are interested in the pronunciation of the phoneme /s/ in specific contexts, to see whether the pronunciation is sometimes more like [ʃ] than [s]. We might expect the /s/ in “seat” to be like [s], where the /s/ in “street” might be pronounced more like [ʃ].
In order to do this, we’re going to identify /s/ tokens in the following contexts:
For comparison purposes, we also want /s/ and /ʃ/ tokens that we take to have the ‘canonical’ pronunciation. For this we will find:
Each of these contexts will be identified by a different regular expression, assuming that the phonemes are encoded using the CELEX ‘DISC’ encoding, which uses exactly one ASCII character per phoneme:
sptkV <- ".*s[ptk][cCEFHiIPqQuUV0123456789~#{$@].*"
sptkr <- ".*s[ptk]r.*"
stj <- ".*stj.*"
sSV <- "[sS][i$#].*"
To measure the pronunciation of the /s/ tokens, we will use the spectral Centre of Gravity (CoG) of the fricative, which we will determine using a custom Praat script which will be executed by LaBB-CAT for each token.
For analysis after extracting the data, we may also want some other information, e.g.
In order to extract the data we need, we need to:
First the nzilbb.labbcat package must be loaded, and the LaBB-CAT corpus is specified:
require(nzilbb.labbcat)
labbcat.url <- "https://labbcat.canterbury.ac.nz/demo/"
labbcat.url <- Sys.getenv('TEST_READ_LABBCAT_URL') # load details from .Renviron file
credentialError <- labbcatCredentials(
labbcat.url, Sys.getenv('TEST_READ_LABBCAT_USERNAME'), Sys.getenv('TEST_READ_LABBCAT_PASSWORD'))
Conduct a search for each of the contexts we’re interested in; we search for the context in the syllable layer, and specify which segment we’re targeting for analysis, i.e. /s/.
sptkV.matches <- getMatches(labbcat.url, list(syllables = sptkV, segment = "s"))
sptkr.matches <- getMatches(labbcat.url, list(syllables = sptkr, segment = "s"))
stj.matches <- getMatches(labbcat.url, list(syllables = stj, segment = "s"))
c(paste("There are", nrow(sptkr.matches), "tokens of ...s[ptk]r..."),
paste("There are", nrow(sptkV.matches), "tokens of ...s[ptk]V..."),
paste("There are", nrow(stj.matches), "tokens of ...stj..."))
## [1] "There are 118 tokens of ...s[ptk]r..."
## [2] "There are 486 tokens of ...s[ptk]V..."
## [3] "There are 9 tokens of ...stj..."
In order to compare pronunciations with the ‘standard’ pronunciation of /s/ or /ʃ/, we also identify some ‘reference’ phones; i.e./s/ or /ʃ/ at the beginning of a word, followed by the FLEECE, THOUGHT, or START vowel (the ‘phonemes’ layer contains the phonemic transcription of the whole word, so using that layer allows us to anchor the pattern to the start of the word).
sSV.matches <- getMatches(labbcat.url, list(phonemes = sSV, segment = "[sS]"))
paste("There are", nrow(sSV.matches), "reference tokens of [sS]V...")
## [1] "There are 279 reference tokens of [sS]V..."
We’ll combine all the data frames into one for convenience; we can use matches$SearchName
to distinguish them if necessary:
matches <- rbind(sptkV.matches, sptkr.matches, stj.matches, sSV.matches)
paste("Total tokens:", nrow(matches))
## [1] "Total tokens: 892"
This gives us a data frame with different tokens, with their start/end times
## Text Target.segment Target.segment.start Target.segment.end
## 1 experiences s 12.92 12.99
## 2 stood s 52.93 53.03
## 3 stay . s 60.45 60.62
## 4 escape s 74.29 74.44
## 5 substantial s 136.57 136.66
## 6 twisted s 144.25 144.29
For all the tokens, we also want the word’s phonemic transcription, and the speaker’s gender and age:
participant.demographics <- getMatchLabels(
labbcat.url, matches$MatchId, c("phonemes", "participant_gender", "participant_age_category"))
matches <- cbind(matches, participant.demographics)
head(matches)[, c(
"Text", "phonemes", "participant_age_category", "participant_gender")]
## Text phonemes participant_age_category participant_gender
## 1 experiences Iksp7r7nsIz 36-45 M
## 2 stood stUd 36-45 M
## 3 stay . st1 36-45 M
## 4 escape Isk1p 36-45 M
## 5 substantial s@bst{nSP 36-45 M
## 6 twisted twIstId 36-45 M
We also want start/end times and phonemic transcription labels for the syllable of the /s/ or /ʃ/ token:
syllable <- getMatchAlignments(labbcat.url, matches$MatchId, c("syllables"))
matches <- cbind(matches, syllable)
head(matches)[, c(
"Text", "Target.segment", "syllables", "syllables.start", "syllables.end")]
## Text Target.segment syllables syllables.start syllables.end
## 1 experiences s 'sp7 12.92 13.11
## 2 stood s 'stUd 52.93 53.24
## 3 stay . s 'st1 60.45 61.21
## 4 escape s 'sk1p 74.29 74.70
## 5 substantial s 'st{n 136.57 136.94
## 6 twisted s stId 144.25 144.44
And the start/end times for the segment that follows the token:
following.segment <- getMatchAlignments(labbcat.url, matches$MatchId, c("segment"), target.offset = 1)
matches <- cbind(matches, following.segment)
head(matches)[, c(
"Token.plus.1.segment", "Token.plus.1.segment.start", "Token.plus.1.segment.end")]
## Token.plus.1.segment Token.plus.1.segment.start Token.plus.1.segment.end
## 1 p 12.99 13.06
## 2 t 53.03 53.11
## 3 t 60.62 60.72
## 4 k 74.44 74.53
## 5 t 136.66 136.73
## 6 t 144.29 144.38
Now we want to calculate Centre of Gravity (CoG) for the target segment. To do this, we use a custom Praat script called CoGFinder.praat. The script provides a measure of Center of Gravity at three points during the fricative (among other things).
We give it the MatchId
, and start/end of the token, and ensure that Praat extracts 0.5s acoustic context before/after the token.
script <- readLines("CoGFinder.praat")
cog <- processWithPraat(
labbcat.url,
matches$MatchId, matches$Target.segment.start, matches$Target.segment.end,
script, window.offset=0.5)
matches <- cbind(matches, cog)
head(matches)[, c("Text", "cog1", "cog2", "cog3")]
## Text cog1 cog2 cog3
## 1 experiences 6072.962 6266.089 6961.379
## 2 stood 4543.998 5214.732 5404.704
## 3 stay . 5118.318 4298.182 4407.484
## 4 escape 4784.069 4667.481 4478.938
## 5 substantial 4185.975 4278.835 4386.030
## 6 twisted 4424.939 4193.710 3932.203
The dataset now includes sufficient information to study the pronunciation of /s/ and how it relates to context and speaker: