Session 2: Data processing
Te Kāhui Roro Reo | New Zealand Institute of Language, Brain and Behaviour
Te Whare Wānanga o Waitaha | University of Canterbury
tidyverse and base R.tidyverse packages:
dplyr: “a grammar of data manipulation”tidyr: a tool to “help you create tidy data”tidyversetidyverse?dplyr - for data manipulationtidyr - for creating ‘tidy’ dataggplot2 - for plotting (see next week)tidyverse%>%
magrittr.|>
dplyrdplyrdplyr:
dplyr verbs:
select(): select one or more columnsfilter(): filter datamutate(): create new columns|> or %>%dplyr ‘verbs’).toddlers$happiness.group_by(): Creates groupscount(age_in_months): if you had a column called ‘age_in_months’, this would group the data by the values in age_in_months and count how many rows there are in each group.Error in `filter()`:
ℹ In argument: `Island == "Torgersen"`.
Caused by error:
! object 'Island' not found
filter()? [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[51] NA NA
tidyrTidy data is data where:
- Each variable is a column; each column is a variable.
- Each observation is a row; each row is an observation.
- Each value is a cell; each cell is a single value.


tidyr provides the functions pivot_wider() and pivot_longer().Image source: Gavin Simpson via Garrick Aden-Buie’s (@grrrck) Tidy Animated Verbs modified by Mara Averick (@dataandme)
scripts/data_processing.R contains some of the code already.data directory.