LaBB-CAT is a browser-based linguistic annotation store that stores audio or video recordings, text transcripts, and other annotations. The nzilbb.labbcat R package provides access to linguistic data stored in LaBB-CAT servers, allowing tokens and their annotations to be identified and extracted, along with media data, and acoustic measurements.

This worked example illustrates how to:

  1. identify a set of phone tokens in a specific context,
  2. extract annotation labels for the word tokens and their speakers,
  3. download audio files for all tokens,
  4. execute a custom R script to extract acoustic measurements from the audio files.

Example: the effects of morphology and demographics on the acoustic properties of /s/

In particular, we might be interested in whether the /s/ in a mis in a prefixed word like mistimes is pronounced differently from the mis in an unprefixed word like mistakes, and whether demographics of the speaker make any difference.

First the nzilbb.labbcat package must be loaded, and the LaBB-CAT corpus is specified:

labbcat.url <- Sys.getenv('TEST_READ_LABBCAT_URL') # load details from .Renviron file
credentialError <- labbcatCredentials(
  labbcat.url, Sys.getenv('TEST_READ_LABBCAT_USERNAME'), Sys.getenv('TEST_READ_LABBCAT_PASSWORD'))

Now we search our LaBB-CAT corpus for force-aligned words with orthography that starts “dis” or “mis”.

matches <- getMatches(labbcat.url, list(orthography = "[dm]is.+"), anchor.confidence.min = 50)

# show the first few matches
head(matches)[, c("Transcript", "Target.word", "Target.word.start")]
##              Transcript Target.word Target.word.start
## 1     AP2505_Nelson.eaf    distance           305.480
## 2     AP2505_Nelson.eaf    distance           345.300
## 3 AP2516_JasonEager.eaf        miss           257.580
## 4 AP2516_JasonEager.eaf        miss           259.740
## 5 AP2516_JasonEager.eaf      miss .           262.180
## 6 AP2516_JasonEager.eaf        miss           266.759

In order to be able to analyse the /s/ segments, we need it’s start and end time, so we can extract a matching sound sample. We identify the first three segment annotations of each token, the last of which will be the /s/ segment we’re interested in.

segments <- getMatchAlignments(
  labbcat.url, matches$MatchId, c("segment"), annotations.per.layer=3)

# combine the segment data with the matches
matches <- cbind(matches, segments)

# show the first few alignments
head(matches)[, c(
  "Text", "segment.3", "segment.3.start", "segment.3.end")]
##       Text segment.3 segment.3.start segment.3.end
## 1 distance         s         305.640       305.690
## 2 distance         s         345.420       345.480
## 3     miss         s         257.750       257.930
## 4     miss         s         259.880       260.030
## 5   miss .         s         262.490       262.990
## 6     miss         s         266.899       267.039

We want to check whether the acoustic properties of /s/ vary depending on whether it’s in a morphological prefix or not, so we need the morphological parses of each word token. We also extract some demographic information about the speakers.

morphology.demographics <- getMatchLabels(
  labbcat.url, matches$MatchId, c("morphology", "participant_age_category", "participant_gender"))

# combine the annotation data with the matches
matches <- cbind(matches, morphology.demographics)

# show the first few annotations
head(matches)[, c(
  "Text", "morphology", "participant_age_category", "participant_gender")]
##       Text   morphology participant_age_category participant_gender
## 1 distance distant+ance                    36-45                  M
## 2 distance distant+ance                    36-45                  M
## 3     miss         Miss                    26-35                  M
## 4     miss         Miss                    26-35                  M
## 5   miss .         Miss                    26-35                  M
## 6     miss         Miss                    26-35                  M

We want to perform acoustic analysis, so we need the wav sample for each match, which can be extracted from LaBB-CAT as as a mono 22kHz files using getSoundFragments.

# define subdirectory to save the files in
subdir <- "s-tokens"

# get segment sound files from LaBB-CAT
wav.files <- getSoundFragments(
  labbcat.url, matches$Transcript, matches$segment.3.start, matches$segment.3.end, 22050, 

# show the first few file names
## [1] "s-tokens/AP2505_Nelson__305.640-305.690.wav"    
## [2] "s-tokens/AP2505_Nelson__345.420-345.480.wav"    
## [3] "s-tokens/AP2516_JasonEager__257.750-257.930.wav"
## [4] "s-tokens/AP2516_JasonEager__259.880-260.030.wav"
## [5] "s-tokens/AP2516_JasonEager__262.490-262.990.wav"
## [6] "s-tokens/AP2516_JasonEager__266.899-267.039.wav"

Now that we have tokens of /s/, we want to use R to perform acoustic analysis of the sound of each token, and save the resulting acoustic measurements in the data frame with the results.

By way of example, Koenig et al. (2013) propose an 8-factor multitaper spectrum calculated over 25 ms portions of the segment, i.e. 8 independent estimates of the spectrum in a single time window are calculated and averaged. We will use a custom R script (adapted from a script written by Patrick Reidy, who has kindly given permission for its use in this example) to provide a function called process.wav which takes a .wav file for a given token and returns the Koenig et al. (2013) measures.

This function is defined in hayhawkins-multitaper.R, which must be loaded:


The process.wav function processes a single sound file, so we apply it to all our /s/ tokens, and add the resulting acoustic measures to our matches data frame.

s.measures.lists <- lapply(wav.files, process.wav)

# combine the acoustic measures with the matches
matches <- cbind(matches,

Now that the sound files we downloaded have been processed, we don’t need them any more, so we tidily delete them to save disk space.

unlink(subdir, recursive = TRUE)

Our process.wav function produces 12 acoustic measures, three of which are printed below by way of example:

  • freqM - peak frequency in the ‘medium’ range (3kHz to 7kHz)
  • freqH - peak frequency in the ‘high’ range (7kHz to the Nyquist frequency)
  • CoG - the centre of gravity of the whole spectrum
head(matches)[, c("Number", "Text", "segment.3.start", "freqM", "freqH", "CoG")]
##   Number     Text segment.3.start    freqM    freqH      CoG
## 1      1 distance         305.640 4401.996 7003.176 6128.388
## 2      2 distance         345.420 3754.891 7030.435 5779.537
## 3      3     miss         257.750 6311.413 7030.435 6149.090
## 4      4     miss         259.880 4673.641 7310.054 6399.511
## 5      5   miss .         262.490 5712.228 7549.728 7242.374
## 6      6     miss         266.899 6990.489 7030.435 6858.675

The dataset now includes acoustic measurements allowing study the acoustic properties of /s/, in relation to morphology and speaker demographics:

  • tokens of /s/ in dis/mis prefixed words,
  • demographic information about the speaker,
  • the morphological parse of the word,
  • various acoustic measures computed from the audio using an R script