Session 3: Exploratory Data Visualisation
Te Kāhui Roro Reo | New Zealand Institute of Language, Brain and Behaviour
Te Whare Wānanga o Waitaha | University of Canterbury
ggplot2.## Markdown
- Markdown is a 'markup' language
- Like LaTeX or HTML.
- Markdown was designed to be easy to read (for a human).
- We combine text in markdown with blocks of code.
- This is a variety of 'literate programming'.
- Markdown documents can be turned into pdf (via LaTeX), HTML, OpenOffice
or Word files.
<section id="markdown-1" class="slide level2">
<h2>Markdown</h2>
<ul>
<li class="fragment">Markdown is a ‘markup’ language
<ul>
<li class="fragment">Like LaTeX or HTML.</li>
</ul></li>
<li class="fragment">Markdown was designed to be easy to read (for a human).</li>
<li class="fragment">We combine text in markdown with blocks of code.</li>
<li class="fragment">This is a variety of ‘literate programming’.</li>
<li class="fragment">Markdown documents can be turned into pdf (via LaTeX), HTML, OpenOffice or Word files.</li>
</ul>
</section>Source view of a Quarto Document open in RStudio.
# Run in terminal
quarto use template JoshuaWilsonBlack/nzilbb_doc
ggplot2ggplot2 stands for ‘grammar of graphics’.…a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data and is drawn on a specific coordinate system. (H. Wickham 2016, 4)
| ...1 | TargetOrthography | foll_wf | prev_wf | Speaker | Corpus | YOB | WordDuration | TargetPhonemes | dur.context | dur.context.avg | prev_pred_wf_log | foll_pred_wf_log | prev_info_wf_avg | foll_info_wf_avg | syll.length.3 | seg.no | repeated.20 | wclass | initial | final | unigram.google.gb | unigram.diff.google.gb | rep.times.avg | final.prop.log | foll.diff | prev.diff | final.diff | dur.context.diff | syll.no |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | able | to | only | speaker426 | CC | 1976 | 0.2600000 | 1bP | 305 | 0.3101003 | -3.962387 | -1.94206 | 1.986253 | 2.008305 | 0.1189995 | 3 | FALSE | a | FALSE | FALSE | -8.647486 | 0.0018382 | 5.36014 | -5.283204 | -0.154497 | -0.503365 | -0.5626083 | -0.1447357 | 2 |
| 2 | able | to | were | speaker66 | CC | 1926 | 0.3400000 | 1bP | 310 | 0.3101003 | -2.403104 | -1.94206 | 1.986253 | 2.008305 | 0.1606245 | 3 | FALSE | a | FALSE | FALSE | -8.647486 | 0.0018382 | 5.36014 | -5.283204 | -0.154497 | -0.503365 | -0.5626083 | -0.1447357 | 2 |
| 3 | able | to | was | speaker256 | IA | 1921 | 0.3900000 | 1bP | 310 | 0.3101003 | -2.800220 | -1.94206 | 1.986253 | 2.008305 | 0.2388915 | 3 | FALSE | a | FALSE | FALSE | -8.647486 | 0.0018382 | 5.36014 | -5.283204 | -0.154497 | -0.503365 | -0.5626083 | -0.1447357 | 2 |
| 4 | able | to | were | speaker256 | IA | 1921 | 0.3000000 | 1bP | 310 | 0.3101003 | -2.403104 | -1.94206 | 1.986253 | 2.008305 | 0.1985703 | 3 | FALSE | a | FALSE | FALSE | -8.647486 | 0.0018382 | 5.36014 | -5.283204 | -0.154497 | -0.503365 | -0.5626083 | -0.1447357 | 2 |
| 5 | able | to | be | speaker105 | CC | 1965 | 0.0521634 | 1bP | 310 | 0.3101003 | -1.467725 | -1.94206 | 1.986253 | 2.008305 | 0.1449822 | 3 | FALSE | a | FALSE | FALSE | -8.647486 | 0.0018382 | 5.36014 | -5.283204 | -0.154497 | -0.503365 | -0.5626083 | -0.1447357 | 2 |
| 6 | able | to | be | speaker288 | MU | 1894 | 0.3100000 | 1bP | 310 | 0.3101003 | -1.467725 | -1.94206 | 1.986253 | 2.008305 | 0.1735298 | 3 | FALSE | a | FALSE | FALSE | -8.647486 | 0.0018382 | 5.36014 | -5.283204 | -0.154497 | -0.503365 | -0.5626083 | -0.1447357 | 2 |
ggplot2 sets up a default scale for the x-axis.
geom_.geom_histogram() creates a histogram.geom_density() creates a density plot.scale_ help to specify aesthetic mappings.big_dia |>
ggplot(
mapping = aes(
x = WordDuration,
fill = Corpus,
colour = Corpus
)
) +
geom_density(alpha = 0.2) +
scale_fill_manual(
values = c("MU" = "#b120cf", "IA" = "#cfb120", "CC" = "#20cfb1")
) +
scale_colour_manual(
values = c("MU" = "#b120cf", "IA" = "#cfb120", "CC" = "#20cfb1")
) +
coord_cartesian(xlim = c(-2.5, 2.5))big_dia |>
ggplot(
mapping = aes(
x = WordDuration,
fill = Corpus,
colour = Corpus
)
) +
geom_density(alpha = 0.2) +
scale_fill_manual(
values = c("MU" = "#b120cf", "IA" = "#cfb120", "CC" = "#20cfb1")
) +
scale_colour_manual(
values = c("MU" = "#b120cf", "IA" = "#cfb120", "CC" = "#20cfb1")
) +
facet_grid(vars(final))var() function.ggplot2.