Bayesian Data Analysis

What Does My Model Assume?

Joshua Wilson Black

Te Kāhui Roro Reo | New Zealand Institute of Language, Brain and Behaviour

Te Whare Wānanga o Waitaha | University of Canterbury

Overview

Overview

  1. What is a prior?
  2. Prior predictive tests
  3. Sensitivity analysis

What’s a prior?

The prior

  • Bayesian models start with a ‘prior’ distribution.
  • The prior says what results are plausible before we see the data.
  • A prior might be (roughly):
    1. uninformative: doesn’t push towards any result,
    2. weakly informative: pushes towards plausible results but can be overcome by sufficient data,
    3. strongly informative: pushes towards plausible results, essentially ruling out others.
  • Strong v. weak is a continuum.
  • These are the three options relevant for reporting.

What actually is Bayes’ Theorem?

\(p(\theta|e) = \frac{p(e|\theta)p(\theta)}{p(e)}\)

  • \(p(e|\theta)\) - The likelihood of the evidence given the hypothesis.
  • \(p(\theta)\) - The prior probability of the hypothesis.
  • \(p(\theta|e)\) = The posterior probability of the hypothesis, given the evidence.
  • The prior and likelihood should be interpreted together.

The prior is just part of the model

  • The prior captures information we have before the data (see Gelman and Hennig 2017)
  • Treat it as just another part of the model.
  • Practical tests of priors:
    • Convince yourself that it’s compatible with any plausible outcome (via a prior predictive test).
    • Check how much your choice influences your result (via a sensitivity analysis).
  • A terminological niggle: you can change your prior, after you see your data!

Probability distributions…

Cauchy

Dirichlet

Poisson

Gaussian

Beta

Gamma

Wishart

Student’s \(t\)

There’s no way around them…

Borrow, test and modify

Uninformative

  • Uninformative priors let the data totally dominate.
  • In practice are very rare (although sometimes used in brms defaults)
default_prior(
    F1_lob2 ~ school,  
    data = trap_sub
)

Uninformative

prior class coef
(flat) b
(flat) b schoolStMargaretsCollege
student_t(3, 0.4, 2.5) Intercept
student_t(3, 0, 2.5) sigma
  • (flat) indicates an uninformative prior.
  • student_t(...), on intercept and variance, is weakly informative.

Weakly informative

rstudent_t(n=1000, 3, 0.4, 2.5) |> 
  hist()

Weakly informative (cont.)

  • The intercept prior is centred on the mean F1,
  • The variance is also estimated from the data.
  • It’s compatible with a very wide range of plausible results.

  • But not with, e.g., an intercept of 2000.
  • Weakly informative priors are typically the best choice.
  • NB: brms defaults use the data to estimate the prior.

Weakly informative (cont. 2)

  • Informativeness is a continuum.
  • Let’s be more informative…
trap_prior <- c(
  prior(student_t(3, 0, 2), class = "Intercept"),
  prior(normal(0, 2), class = "sigma"),
  prior(normal(0, 0.5), class = "b")
)
trap_fit_b <- brm(
  F1_lob2 ~ school,  
  data = trap_sub,
  prior = trap_prior
)

Weakly informative (cont. 3)

We fit a Bayesian linear regression with weakly informative priors using the brms pakage. The intercept, representing the mean F1 for Avonside students, has a weakly informative Student t distribution (df=3, mean=0, scale=2) with moderately heavy atails. The difference between Avonside and St Margarets has a normal (mean=0, sd=0.5) prior, which expresses the expectation that the students come from the same dialect and that their mean trap F1 is very unlikely to be more than 1 point different in Lobanov normalised space. The prior on residual standard deviation is normal (mean=0, sd=2), accomodating a wide range of variation with Lobanov normalised formant values.

Strongly informative

  • Our weakly informative prior was based on readily available information about, e.g., Lobanov normalised formant values.
  • We might want to add new data to our existing model.
  • We might have some other source on information.
  • This can be done using ‘strongly informative’ priors.

Strongly informative (cont. 2)

  • Here’s a strong and false prior.
bad_prior <- c(
  prior(student_t(3, 0, 2), class = "Intercept"),
  prior(normal(0, 2), class = "sigma"),
  prior(normal(0, 0.01), class = "b")
)
trap_fit_s <- brm(
  F1_lob2 ~ school,  
  data = trap_sub,
  prior = bad_prior
)

Strongly informative (cont. 3)

rnorm(1000, mean=0, sd=0.01) |> 
  hist()

  • We’re not allowing differences larger than about 0.03 in normalised space.
  • Presumably this is not actually detectable.

Strongly informative (cont. 4)

fixef(trap_fit_s)
                           Estimate   Est.Error         Q2.5      Q97.5
Intercept                0.39076070 0.025395642  0.339740565 0.43948808
schoolStMargaretsCollege 0.01560946 0.009758907 -0.003597578 0.03461282

vs.

fixef(trap_fit_b)
                          Estimate  Est.Error      Q2.5     Q97.5
Intercept                0.2479392 0.02972882 0.1900104 0.3047536
schoolStMargaretsCollege 0.4728209 0.05449568 0.3684790 0.5798067
  • Our prior expectation on the size of the difference between schools has messed up both the intercept (Avondside) and the St Margaret’s coefficients.

A rule

Explicitly specify your prior!

  • Don’t use the default without thinking
  • If you do decide to use the default, put it in your code explicitly.
    • i.e., not ‘we used the default prior’,
    • rather, ‘we used an uninformative prior for all coefficients, with a weakly informative [etc.]’

Prior predictive checks

The idea

  • Last week we drew from the posterior distribution.
  • We can also draw from the prior and see what comes out (using the likelihood).
  • Easy to do with brms.
  • If the values are implausibly wide, we can narrow the prior (and vice versa)

Implementation

trap_ppc <- brm(
  F1_lob2 ~ school,  
  data = trap_sub,
  prior = trap_prior,
  sample_prior = "only"
)

Sampling

library(tidybayes)
prior_draws <- trap_ppc |>
  predicted_draws(newdata = trap_sub, ndraws = 100)

Plotting

draws_to_plot <- sample(1:100, size = 4)
prior_draws |> 
  filter(
    .draw %in% draws_to_plot
  ) |> 
  ggplot(
    aes(
      x = school,
      y = .prediction
    )
  ) +
  geom_violin(
    quantiles = c(0.5),
    quantile.linetype = "dashed",
    quantile.linewidth = 1
  ) +
  facet_wrap(vars(.draw))

Plotting (cont. 2)

  • Bulk sitting in plausible values
  • Outliers are a bit excessive
  • We could tighten this up if we wanted
  • In more complex models, this is more useful!
  • We’re plotting ‘draws’ separately.

Sensitivity analysis

The idea

  • Try some other priors, see how much it affects the results.
  • I’d always do a prior predictive check, but only do a sensitivity analysis if I were worried.

Automate

library(priorsense)
powerscale_sensitivity(trap_fit_b)
Sensitivity based on cjs_dist
Prior selection: all priors
Likelihood selection: all data

                   variable prior likelihood diagnosis
                b_Intercept 0.009      0.090         -
 b_schoolStMargaretsCollege 0.017      0.094         -
                      sigma 0.001      0.100         -
                  Intercept 0.001      0.082         -

Automate (cont. 2)

powerscale_plot_dens(trap_fit_b)

Automate (cont. 3)

powerscale_sensitivity(trap_fit_s)
Sensitivity based on cjs_dist
Prior selection: all priors
Likelihood selection: all data

                   variable prior likelihood                     diagnosis
                b_Intercept 0.028      0.090                             -
 b_schoolStMargaretsCollege 0.295      0.239 potential prior-data conflict
                      sigma 0.010      0.094                             -
                  Intercept 0.006      0.094                             -

Summary

Summary

  1. What is a prior?
    • incl. uninformative, weakly informative, or strongly informative.
  2. Prior predictive tests
    • i.e. what possibilities is my prior consistent with?
  3. Sensitivity analysis
    • i.e. what are the consequences of changing the prior for my results?

Now and next time

References

Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2025. rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.
Bürkner, Paul-Christian. 2017. brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.
———. 2018. “Advanced Bayesian Multilevel Modeling with the R Package brms.” The R Journal 10 (1): 395–411. https://doi.org/10.32614/RJ-2018-017.
———. 2021. “Bayesian Item Response Modeling in R with brms and Stan.” Journal of Statistical Software 100 (5): 1–54. https://doi.org/10.18637/jss.v100.i05.
Gelman, Andrew, and Christian Hennig. 2017. “Beyond Subjective and Objective in Statistics.” Journal of the Royal Statistical Society Series A: Statistics in Society 180 (4): 967–1033. https://doi.org/10.1111/rssa.12276.
Kallioinen, Noa, Topi Paananen, Paul-Christian Bürkner, and Aki Vehtari. 2023. “Detecting and Diagnosing Prior and Likelihood Sensitivity with Power-Scaling.” Statistics and Computing 34. https://doi.org/10.1007/s11222-023-10366-5.
Kay, Matthew. 2024. tidybayes: Tidy Data and Geoms for Bayesian Models. https://doi.org/10.5281/zenodo.1308151.
Müller, Kirill. 2025. here: A Simpler Way to Find Your Files. https://doi.org/10.32614/CRAN.package.here.
Paananen, Topi, Juho Piironen, Paul-Christian Bürkner, and Aki Vehtari. 2021. “Implicitly Adaptive Importance Sampling.” Statistics and Computing 31: 1–19. https://doi.org/10.1007/s11222-020-09982-2.
R Core Team. 2025. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Vehtari, Aki, Daniel Simpson, Andrew Gelman, Yuling Yao, and Jonah Gabry. 2024. “Pareto Smoothed Importance Sampling.” Journal of Machine Learning Research 25.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Xie, Yihui. 2014. knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2025. knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.org/knitr/.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.