In order to probe the phenomenon picked out by the first PC of our two PCA
analyses, we fit a series of models. First, we model F1 on the basis of
amplitude to see how changes in amplitude affect each vowel’s F1. This is
motivated by the thought that amplitude drives the common movement on F1
found by the first PC in our PCA analysis (developed in corpus_pca.Rmd
).
Second, we investigate the sociolinguistic question of whether and to what extent changes in amplitude suggest that a speaker is coming to the end of a discrete topical unit of a monologue.
For both questions we will first apply simple linear and linear mixed models, before turning to more sophisticated GAMM models.
In order for this document to be independently understandable, we briefly run through the phenomenon of interest from the previous supplementary materials.
We first load the required libraries and define global variables.
# Tidyverse and friends
library(tidyverse)
library(broom)
library(glue)
library(patchwork)
# Animations
library(gganimate)
library(magick)
# Interactive plots
library(plotly)
# File management
library(here)
# Data scaling
library(scales)
# GAMMs
library(mgcv)
library(itsadug)
library(gratia)
# Linear Mixed Models
library(lme4)
library(optimx)
library(car)
# For variance inflation function
library(car)
# parallel computing - only used for `detectCores` function.
library(parallel)
# Global variables for plotting
vowel_colours_with_foot <- c(
START = "#00B0F6",
STRUT = "#F8766D",
LOT = "#00BF7D",
TRAP = "#FF62BC",
FOOT = "#966432",
KIT = "#39B600",
NURSE = "#00BFC4",
THOUGHT = "#E76BF3",
DRESS = "#9590FF",
FLEECE = "#D89000",
GOOSE = "#A3A500"
)
# Order = order at which these vowels appear on the right side of the main plot
# of the model of F1 by amplitude which is used in the paper. We don't reorder
# for each plot in these supplementaries.
vowels <- c(
"START", "STRUT", "LOT", "TRAP", "FOOT", "KIT", "NURSE", "THOUGHT",
"DRESS", "FLEECE", "GOOSE"
)
# Sometimes it is useful to split plots between high vowels and others.
# We define high and low vowels here for this purpose.
high_vowels <- c(
"DRESS",
"GOOSE",
"THOUGHT",
"FLEECE",
"NURSE"
)
front_vowels <- c(
"DRESS",
"FLEECE",
"NURSE",
"GOOSE",
"TRAP"
)
# Random seed set for reproducibility.
set.seed(5)
knitr::include_graphics(here('plots', 'PCA_with_amplitude_varplot.png'))
Figure 1.1: Variables plots from PCA analysis.
As depicted in Figure 1.1, PC1 of our PCA analysis for both 60 second and 240 second intervals reveals that F1s of each vowel move together with amplitude, and that this effect explains around 7.7% of the variance for the 60 second intervals and around 10.3% of the variance for the 240 second intervals.
In SM2_interval_representation.Rmd
, examples of amplitude over the course of a
monologue were presented. These seemed to suggest that amplitude systematically
drops over time (e.g. the bottom panels of Figure
1.2).
knitr::include_graphics(here('plots', 'QB_NZ_F_369_combined.png'))
Figure 1.2: Amplitude over the course of monologue for 60 and 240 second itnervals.
We will be interested in whether changes in F1 over the course of a monologue are explained by changes in amplitude and whether there is some other effect which might explain systematic shifts in F1 over the course of a monologue.
Connection between SM4 and the paper: The models reported in the paper are developed in Section 2.3 and Section 3.2. The remainder of the document consists of assumption checks and exploration of alternative methods. The fact that alternative methods produce compatible results provides a ‘sanity check’ on the methods which we do report.
We load the filtered data.
qb_vowels <- read_rds(
here('processed_data', 'Quakebox_filtered.rds')
)
qb_vowels