search-labels-alignments-r-script.Rmd
LaBB-CAT is a browser-based linguistic annotation store that stores audio or video recordings, text transcripts, and other annotations. The nzilbb.labbcat R package provides access to linguistic data stored in LaBB-CAT servers, allowing tokens and their annotations to be identified and extracted, along with media data, and acoustic measurements.
This worked example illustrates how to:
In particular, we might be interested in whether the /s/ in a mis in a prefixed word like mistimes is pronounced differently from the mis in an unprefixed word like mistakes, and whether demographics of the speaker make any difference.
First the nzilbb.labbcat package must be loaded, and the LaBB-CAT corpus is specified:
require(nzilbb.labbcat)
labbcat.url <- Sys.getenv('TEST_READ_LABBCAT_URL') # load details from .Renviron file
credentialError <- labbcatCredentials(
labbcat.url, Sys.getenv('TEST_READ_LABBCAT_USERNAME'), Sys.getenv('TEST_READ_LABBCAT_PASSWORD'))
Now we search our LaBB-CAT corpus for force-aligned words with orthography that starts “dis” or “mis”.
matches <- getMatches(labbcat.url, list(orthography = "[dm]is.+"), anchor.confidence.min = 50)
# show the first few matches
head(matches)[, c("Transcript", "Target.word", "Target.word.start")]
## Transcript Target.word Target.word.start
## 1 AP2505_Nelson.eaf distance 305.480
## 2 AP2505_Nelson.eaf distance 345.300
## 3 AP2516_JasonEager.eaf miss 257.580
## 4 AP2516_JasonEager.eaf miss 259.740
## 5 AP2516_JasonEager.eaf miss . 262.180
## 6 AP2516_JasonEager.eaf miss 266.759
In order to be able to analyse the /s/ segments, we need it’s start and end time, so we can extract a matching sound sample. We identify the first three segment annotations of each token, the last of which will be the /s/ segment we’re interested in.
segments <- getMatchAlignments(
labbcat.url, matches$MatchId, c("segment"), annotations.per.layer=3)
# combine the segment data with the matches
matches <- cbind(matches, segments)
# show the first few alignments
head(matches)[, c(
"Text", "segment.3", "segment.3.start", "segment.3.end")]
## Text segment.3 segment.3.start segment.3.end
## 1 distance s 305.640 305.690
## 2 distance s 345.420 345.480
## 3 miss s 257.750 257.930
## 4 miss s 259.880 260.030
## 5 miss . s 262.490 262.990
## 6 miss s 266.899 267.039
We want to check whether the acoustic properties of /s/ vary depending on whether it’s in a morphological prefix or not, so we need the morphological parses of each word token. We also extract some demographic information about the speakers.
morphology.demographics <- getMatchLabels(
labbcat.url, matches$MatchId, c("morphology", "participant_age_category", "participant_gender"))
# combine the annotation data with the matches
matches <- cbind(matches, morphology.demographics)
# show the first few annotations
head(matches)[, c(
"Text", "morphology", "participant_age_category", "participant_gender")]
## Text morphology participant_age_category participant_gender
## 1 distance distant+ance 36-45 M
## 2 distance distant+ance 36-45 M
## 3 miss Miss 26-35 M
## 4 miss Miss 26-35 M
## 5 miss . Miss 26-35 M
## 6 miss Miss 26-35 M
We want to perform acoustic analysis, so we need the wav sample for each match, which can be extracted from LaBB-CAT as as a mono 22kHz files using getSoundFragments
.
# define subdirectory to save the files in
subdir <- "s-tokens"
# get segment sound files from LaBB-CAT
wav.files <- getSoundFragments(
labbcat.url, matches$Transcript, matches$segment.3.start, matches$segment.3.end, 22050,
path=subdir)
# show the first few file names
head(wav.files)
## [1] "s-tokens/AP2505_Nelson__305.640-305.690.wav"
## [2] "s-tokens/AP2505_Nelson__345.420-345.480.wav"
## [3] "s-tokens/AP2516_JasonEager__257.750-257.930.wav"
## [4] "s-tokens/AP2516_JasonEager__259.880-260.030.wav"
## [5] "s-tokens/AP2516_JasonEager__262.490-262.990.wav"
## [6] "s-tokens/AP2516_JasonEager__266.899-267.039.wav"
Now that we have tokens of /s/, we want to use R to perform acoustic analysis of the sound of each token, and save the resulting acoustic measurements in the data frame with the results.
By way of example, Koenig et al. (2013) propose an 8-factor multitaper spectrum calculated over 25 ms portions of the segment, i.e. 8 independent estimates of the spectrum in a single time window are calculated and averaged. We will use a custom R script (adapted from a script written by Patrick Reidy, who has kindly given permission for its use in this example) to provide a function called process.wav
which takes a .wav file for a given token and returns the Koenig et al. (2013) measures.
This function is defined in hayhawkins-multitaper.R, which must be loaded:
source('hayhawkins-multitaper.R')
The process.wav
function processes a single sound file, so we apply
it to all our /s/ tokens, and add the resulting acoustic measures to our matches data frame.
s.measures.lists <- lapply(wav.files, process.wav)
# combine the acoustic measures with the matches
matches <- cbind(matches, as.data.frame(rbindlist(s.measures.lists)))
Now that the sound files we downloaded have been processed, we don’t need them any more, so we tidily delete them to save disk space.
unlink(subdir, recursive = TRUE)
Our process.wav
function produces 12 acoustic measures, three of which are printed below by way of example:
## Number Text segment.3.start freqM freqH CoG
## 1 1 distance 305.640 4401.996 7003.176 6128.388
## 2 2 distance 345.420 3754.891 7030.435 5779.537
## 3 3 miss 257.750 6311.413 7030.435 6149.090
## 4 4 miss 259.880 4673.641 7310.054 6399.511
## 5 5 miss . 262.490 5712.228 7549.728 7242.374
## 6 6 miss 266.899 6990.489 7030.435 6858.675
The dataset now includes acoustic measurements allowing study the acoustic properties of /s/, in relation to morphology and speaker demographics: