Bayesian Data Analysis

What Does My Model Say?

Joshua Wilson Black

joshua.black@canterbury.ac.nz

Te Kāhui Roro Reo | New Zealand Institute of Language, Brain and Behaviour

Te Whare Wānanga o Waitaha | University of Canterbury

Overview

Extracting and plotting estimates
Evaluating model fit

Extracting and plotting estimates

The posterior

Bayesian models produce probability distributions for every ingredient of the model.
The probability distribution after the model has been updated by the data is called the posterior distribution
We use the posterior distribution to make predictions, resulting in a posterior predicitive distribution

🎶 glorious things of thee are spoken… 🎶

Front vowels of Southern Standard British English (SSBE) have lowered over 20th C.
Assumption: British choral singing is closely related to SSBE
Question: Do front vowels lower in choral singing in both SSBE (Cambridge) and non-SSBE (Glasgow) areas (Marshall et al. 2024)
Method: Bayesian linear mixed effects models

Model

Glasgow and King’s College choirs modelled separately:

Vowel Formant ∼ Vowel + Time + Duration + FollowingSegment + 
  Genre + Vowel:Time + Vowel:Duration + Time:Duration + 
  Genre:Duration + (1|Album) + (1|Song) + (1|Word) + 
  (1|Album:Song) + (1|Song:Word)

Vowel: fleece, kit, dress, trap
Time: Identified with periods of musical directors
(I’ll tell you about the priors next week)

Drawing from the Posterior Distribution

We can draw samples from any probability distribution and summarise the sample.
This is how we learn about model parameters
No meaningful distinction between model parameters and ‘post-hoc’ comparisons
- What is ‘post’ here? post statistical test. Doesn’t apply here. (cf. Marshall et al.)

Parameters

Let’s look at where kit, dress, trap sit by sampling from the posterior.
NB: they are ‘sum coded’.
- i.e., they represent differences from the grand mean.

library(tidybayes)
vowel_post <- f1_mod |> 
  gather_draws(regex = TRUE, `b_vowel[A-Z]+`) 
  
vowel_post |> 
  ggplot(
    aes(
      y = .variable,
      x = .value
    )
  ) +
  stat_halfeye(.width = c(0.5, 0.7, 0.95))

Parameters

I mean everything has a distribution

Where does the F1 of ‘lowing’ in Away in a Manger sit after taking into account the main effects?

a_word_post <- f1_mod |> 
  gather_draws(`r_song:text[away.in.a.manger_lowing,Intercept]`) |> 
  mutate(.variable = "lowing")
  
a_word_post |> 
  ggplot(
    aes(
      y = .variable,
      x = .value
    )
  ) +
  stat_halfeye(.width = c(0.5, 0.7, 0.95))

I mean everything has a distribution

Point and interval summaries

Pick a point (mean, median, mode) and an interval.
Highest density interval: the shortest interval containing a given probability mass (e.g. 87%)

a_word_post |> median_hdi(.width= .87)

# A tibble: 1 × 7
  .variable .value   .lower .upper .width .point .interval
  <chr>      <dbl>    <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 lowing     0.362 -0.00986  0.707   0.87 median hdi

Quantile interval: interval centred on the median, containing a given probability mass.

a_word_post |> median_qi(.width= .95)

# A tibble: 1 × 7
  .variable .value .lower .upper .width .point .interval
  <chr>      <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 lowing     0.362 -0.105  0.826   0.95 median qi

i.e.: from the 0.025 to the 0.975 quantile (with .width=0.95)

Exploring an interaction

Marshall et. al. use the following plot (my code):

library(modelbased)
f1_means <- estimate_means(
  f1_mod, by = c("vowel", "director"),
  backend = "emmeans",
  ci = 0.9
)

f1_means |> 
  mutate(
    region = if_else(
      str_detect(director, "BO|DW|PL|SC"), "Cambridge", "Glasgow"
    )
  ) |> 
  ggplot(
    aes(
      x = vowel,
      y = Mean,
      ymin = CI_low,
      ymax = CI_high,
      colour = director,
      shape = region
    )
  ) +
  geom_errorbar(position=position_dodge(width=0.6)) +
  geom_point(position=position_dodge(width=0.6)) +
  scale_y_reverse() +
  labs(
    x = NULL,
    y = "F1",
    colour = "Director",
    shape = "Region"
  )

Exploring an interaction

Plotting predictions

The posterior predictive distribution is what happens when we draw predictions from the posterior.
This includes all of the uncertainty in the parameters.
- e.g., some predictions for trap will be ~1.05 above the grand mean, more will be ~0.95.
The posterior distribution describes the model parameters which best match the data.
The posterior predictive distribution describes the predictions we expect from the model given the data.

Get draws from PPD

mod_preds <- estimate_prediction(f1_mod, data="grid", iterations=100)

Plot predicted F1s

mod_preds |> 
  filter(
    director %in% c("HSR (1925-1957)", "MJS (1987-2019)")
  ) |> 
  ggplot(
    aes(
      y = Predicted,
      x = vowel,
      colour = director
    )
  ) +
  geom_jitter(alpha = 0.2) +
  scale_y_reverse()

Plot predicted F1s

Summarise predicted F1

mod_preds |> 
  filter(
    director %in% c("HSR (1925-1957)", "MJS (1987-2019)")
  ) |> 
  ggplot(
    aes(
      y = Predicted,
      x = vowel,
      colour = director
    )
  ) +
  geom_boxplot(position="dodge") +
  scale_y_reverse()

Summarise predicted F1

Evaluating model fit

Posterior predictive checks

Simple idea: compare your predictions to the actual data.
If your predictions don’t look anything like the data, then your model is in trouble.
This is signal to either ditch the model or improve it.
These model checks are visual and informal.

`ppd()`

One simple check is to look at the distribution of predicted values against the actual response variable.
The pp_check() function comes from brms.

pp_check(
  f1_mod, type="dens_overlay", ndraws = 100
)

`ppd()`

Whatever you can think of…

What is it most important for your model to capture?
Perhaps I wonder if the effect of duration is non-linear…

front_data <- read_rds(here('osf', 'mod_data.rds'))
front_data |> 
  ggplot(
    aes(
      x = log_vdur,
      y = f1_lob
    )
  ) +
  geom_point(alpha=0.1, colour="grey") +
  geom_smooth() +
  geom_smooth(
    aes(
      x = log_vdur,
      y = Predicted
    ),
    data = mod_preds,
    method = "lm",
    colour = "black"
  ) +
  facet_wrap(vars(vowel))

Whatever you can think of…

Now and next time

Download GitHub repository: https://github.com/nzilbb/ws-bayes-2
Work through analysis.R
Next time: what on earth is the prior and how do I choose one?

References

Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2025. rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.

Bürkner, Paul-Christian. 2017. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.

———. 2018. “Advanced Bayesian Multilevel Modeling with the R Package brms.” The R Journal 10 (1): 395–411. https://doi.org/10.32614/RJ-2018-017.

———. 2021. “Bayesian Item Response Modeling in R with brms and Stan.” Journal of Statistical Software 100 (5): 1–54. https://doi.org/10.18637/jss.v100.i05.

Dahl, David B., David Scott, Charles Roosen, Arni Magnusson, and Jonathan Swinton. 2019. xtable: Export Tables to LaTeX or HTML. https://doi.org/10.32614/CRAN.package.xtable.

Karreth, Johannes, Shana Scogin, Rob Williams, and Andreas Beger. 2025. BayesPostEst: Generate Postestimation Quantities for Bayesian MCMC Estimation. https://doi.org/10.32614/CRAN.package.BayesPostEst.

Kay, Matthew. 2024. tidybayes: Tidy Data and Geoms for Bayesian Models. https://doi.org/10.5281/zenodo.1308151.

Lenth, Russell V., and Julia Piaskowski. 2025. emmeans: Estimated Marginal Means, Aka Least-Squares Means. https://doi.org/10.32614/CRAN.package.emmeans.

Makowski, Dominique, Mattan S. Ben-Shachar, Brenton M. Wiernik, Indrajeet Patil, Rémi Thériault, and Daniel Lüdecke. 2025. “modelbased: An R Package to Make the Most Out of Your Statistical Models Through Marginal Means, Marginal Effects, and Model Predictions.” Journal of Open Source Software 10 (109): 7969. https://doi.org/10.21105/joss.07969.

Marshall, Edward J., Jane Stuart-Smith, John Butt, and Timothy Dean. 2024. “Variation and Change over Time in British Choral Singing (1925–2019).” Laboratory Phonology 15 (1). https://doi.org/10.16995/labphon.10125.

Müller, Kirill. 2025. here: A Simpler Way to Find Your Files. https://doi.org/10.32614/CRAN.package.here.

R Core Team. 2025. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.

Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.