Data manipulation
Objective
- Use git branches in the “real world”
- Reproducibly clean, summarize, and organize dataframes using tidyverse packages
- Understand how R stores different data types (data frames, vector types, missing data)
- Know what the tidyverse is, how it differs from base R, and the philosophy behind using it here.
- Use the pipe to chain operations together
- Use dplyr functions to subset data (select, filter; logic and select fxns including where, ==, %in%, !) and manipulate data (mutate; lubridate; split-apply-combine)
- Use tidyr functions to reshape data (pivot_wider and pivot_longer)
Lesson outline
- Review from last week
- Warm-up: create a branch for today’s work
- Slides/discussion: using R for reproducible data analysis
- Why use R?
- What is the tidyverse and why use it?
- Install dplyr and tidyr
- Live coding: How R thinks about data
- Data carpentry R ecology revamp episode #2
- Data frames
- Vectors and data types
- Missing data
- Live coding: dplyr and tidyr
- Data carpentry R ecology revamp episode #3
- Recent DC + R lesson
- Chaining lines together with the pipe
%>%
,|>
- Subsetting and filtering data
- incl. selection and
pick
https://dplyr.tidyverse.org/reference/dplyr_tidy_select.html
- incl. selection and
- Adding columns
- Split-apply-combine
- Reshaping
- Live coding: advanced tidyverse topics
- Options: across; dates; advanced joins; others?
- Live coding/discussion: getting help
reprex
- Live coding: practice modify-add-commit cycle
- Homework: None
Installation & materials
- Slides
- Install R packages ‘dplyr’, ‘tidyr’, ‘readr’
- Data carpentry R ecology revamp episode #2
- Data carpentry R ecology revamp episode #3
Citation
BibTeX citation:
@online{scott2024,
author = {Scott, Eric and Diaz, Renata and Guo, Jessica and Riemer,
Kristina},
title = {Data Manipulation},
date = {2024},
url = {https://cct-datascience.github.io/repro-data-sci/lessons/7-data-manipulation/notes.html},
doi = {10.5281/zenodo.8411612},
langid = {en}
}
For attribution, please cite this work as:
Scott, Eric, Renata Diaz, Jessica Guo, and Kristina Riemer. 2024.
“Data Manipulation.” Reproducibility & Data Science in
R. 2024. https://doi.org/10.5281/zenodo.8411612.