Search, Annotation, and Praat Script Example

LaBB-CAT is a browser-based linguistic annotation store that stores audio or video recordings, text transcripts, and other annotations. The nzilbb.labbcat R package provides access to linguistic data stored in LaBB-CAT servers, allowing tokens and their annotations to be identified and extracted, along with media data, and acoustic measurements.

This worked example shows how to:

identify a set of phone tokens in specific contexts,
extract annotation labels and alignments for the tokens and their surrounding annotions,
execute a custom Praat script to extract acoustic measurements.

Computing Centre of Gravity for /s/ in different contexts

In particular, we are interested in the pronunciation of the phoneme /s/ in specific contexts, to see whether the pronunciation is sometimes more like [ʃ] than [s]. We might expect the /s/ in “seat” to be like [s], where the /s/ in “street” might be pronounced more like [ʃ].

In order to do this, we’re going to identify /s/ tokens in the following contexts:

/s/ before /p/, /t/, or /k/, followed by a vowel, and
/s/ before /p/, /t/, or /k/, followed by /ɹ/, and
/s/ before /tj/, and for comparison,

For comparison purposes, we also want /s/ and /ʃ/ tokens that we take to have the ‘canonical’ pronunciation. For this we will find:

word-inital /s/ or /ʃ/, followed by the FLEECE, THOUGHT, or START vowel

Each of these contexts will be identified by a different regular expression, assuming that the phonemes are encoded using the CELEX ‘DISC’ encoding, which uses exactly one ASCII character per phoneme:

sptkV <- ".*s[ptk][cCEFHiIPqQuUV0123456789~#{$@].*"
sptkr <- ".*s[ptk]r.*"
stj   <- ".*stj.*"
sSV   <- "[sS][i$#].*"

To measure the pronunciation of the /s/ tokens, we will use the spectral Centre of Gravity (CoG) of the fricative, which we will determine using a custom Praat script which will be executed by LaBB-CAT for each token.

For analysis after extracting the data, we may also want some other information, e.g.

speaker demographics like age and gender
which phone immediately follows the token
the phonemic transcription of the syllable
the duration of the phones and syllables

In order to extract the data we need, we need to:

identify tokens that match each of the target contexts
get various types of annotations on the tokens
get alignments (start/end times)
provide the Praat script to LaBB-CAT, and extract the resulting acoustic measures.

First the nzilbb.labbcat package must be loaded, and the LaBB-CAT corpus is specified:

require(nzilbb.labbcat)
labbcat.url <- "https://labbcat.canterbury.ac.nz/demo/"
labbcat.url <- Sys.getenv('TEST_READ_LABBCAT_URL') # load details from .Renviron file
credentialError <- labbcatCredentials(
  labbcat.url, Sys.getenv('TEST_READ_LABBCAT_USERNAME'), Sys.getenv('TEST_READ_LABBCAT_PASSWORD'))

Conduct a search for each of the contexts we’re interested in; we search for the context in the syllable layer, and specify which segment we’re targeting for analysis, i.e. /s/.

sptkV.matches <- getMatches(labbcat.url, list(syllables = sptkV, segment = "s"))
sptkr.matches <- getMatches(labbcat.url, list(syllables = sptkr, segment = "s"))
stj.matches <- getMatches(labbcat.url, list(syllables = stj, segment = "s"))

c(paste("There are", nrow(sptkr.matches), "tokens of ...s[ptk]r..."),
  paste("There are", nrow(sptkV.matches), "tokens of ...s[ptk]V..."),
  paste("There are", nrow(stj.matches), "tokens of ...stj..."))

## [1] "There are 118 tokens of ...s[ptk]r..."
## [2] "There are 489 tokens of ...s[ptk]V..."
## [3] "There are 10 tokens of ...stj..."

In order to compare pronunciations with the ‘standard’ pronunciation of /s/ or /ʃ/, we also identify some ‘reference’ phones; i.e./s/ or /ʃ/ at the beginning of a word, followed by the FLEECE, THOUGHT, or START vowel (the ‘phonemes’ layer contains the phonemic transcription of the whole word, so using that layer allows us to anchor the pattern to the start of the word).

sSV.matches <- getMatches(labbcat.url, list(phonemes = sSV, segment = "[sS]"))

paste("There are", nrow(sSV.matches), "reference tokens of [sS]V...")

## [1] "There are 282 reference tokens of [sS]V..."

We’ll combine all the data frames into one for convenience; we can use matches$SearchName to distinguish them if necessary:

matches <- rbind(sptkV.matches, sptkr.matches, stj.matches, sSV.matches)

paste("Total tokens:", nrow(matches))

## [1] "Total tokens: 899"

This gives us a data frame with different tokens, with their start/end times

head(matches)[, c(
  "Text", "Target.segment", "Target.segment.start", "Target.segment.end")]

##          Text Target.segment Target.segment.start Target.segment.end
## 1 experiences              s                12.92              12.99
## 2       stood              s                52.93              53.03
## 3      stay .              s                60.45              60.62
## 4      escape              s                74.29              74.44
## 5 substantial              s               136.57             136.66
## 6     twisted              s               144.25             144.29

For all the tokens, we also want the word’s phonemic transcription, and the speaker’s gender and age:

participant.demographics <- getMatchLabels(
  labbcat.url, matches$MatchId, c("phonemes", "participant_gender", "participant_age_category"))

matches <- cbind(matches, participant.demographics)
head(matches)[, c(
  "Text", "phonemes", "participant_age_category", "participant_gender")]

##          Text    phonemes participant_age_category participant_gender
## 1 experiences Iksp7r7nsIz                    36-45                  M
## 2       stood        stUd                    36-45                  M
## 3      stay .         st1                    36-45                  M
## 4      escape       Isk1p                    36-45                  M
## 5 substantial   s@bst{nSP                    36-45                  M
## 6     twisted     twIstId                    36-45                  M

We also want start/end times and phonemic transcription labels for the syllable of the /s/ or /ʃ/ token:

syllable <- getMatchAlignments(labbcat.url, matches$MatchId, c("syllables"))

matches <- cbind(matches, syllable)
head(matches)[, c(
  "Text", "Target.segment", "syllables", "syllables.start", "syllables.end")]

##          Text Target.segment syllables syllables.start syllables.end
## 1 experiences              s      'sp7           12.92         13.11
## 2       stood              s     'stUd           52.93         53.24
## 3      stay .              s      'st1           60.45         61.21
## 4      escape              s     'sk1p           74.29         74.70
## 5 substantial              s     'st{n          136.57        136.94
## 6     twisted              s      stId          144.25        144.44

And the start/end times for the segment that follows the token:

following.segment <- getMatchAlignments(labbcat.url, matches$MatchId, c("segment"), target.offset = 1)

matches <- cbind(matches, following.segment)
head(matches)[, c(
  "Token.plus.1.segment", "Token.plus.1.segment.start", "Token.plus.1.segment.end")]

##   Token.plus.1.segment Token.plus.1.segment.start Token.plus.1.segment.end
## 1                    p                      12.99                    13.06
## 2                    t                      53.03                    53.11
## 3                    t                      60.62                    60.72
## 4                    k                      74.44                    74.53
## 5                    t                     136.66                   136.73
## 6                    t                     144.29                   144.38

Now we want to calculate Centre of Gravity (CoG) for the target segment. To do this, we use a custom Praat script called CoGFinder.praat. The script provides a measure of Center of Gravity at three points during the fricative (among other things).

We give it the MatchId, and start/end of the token, and ensure that Praat extracts 0.5s acoustic context before/after the token.

script <- readLines("CoGFinder.praat")
cog <- processWithPraat(
  labbcat.url,
  matches$MatchId, matches$Target.segment.start, matches$Target.segment.end,
  script, window.offset=0.5)

matches <- cbind(matches, cog)
head(matches)[, c("Text", "cog1", "cog2", "cog3")]

##          Text     cog1     cog2     cog3
## 1 experiences 6072.962 6266.089 6961.379
## 2       stood 4543.998 5214.732 5404.704
## 3      stay . 5118.318 4298.182 4407.484
## 4      escape 4784.069 4667.481 4478.938
## 5 substantial 4185.975 4278.835 4386.030
## 6     twisted 4424.939 4193.710 3932.203

The dataset now includes sufficient information to study the pronunciation of /s/ and how it relates to context and speaker:

tokens of /s/ and /ʃ/ in various contexts, including start/end time,
demographic information about the speaker,
the phonemic transcription of the word,
the syllable in which the token appears, including start/end time,
the following phone, including start/end time, and
Centre of Gravity at various points throughout the token

Robert Fromont

25 August 2020

Computing Centre of Gravity for /s/ in different contexts