dplyr
and tidyr
Session 7
September 24, 2024
Informal poll:
Do you collaborate with anyone who could or currently uses GitHub?
Chat:
What would you say to encourage them to start?
dplyr
and tidyr
) to work with data in R.Create a branch for today’s work in your workshop repo.
Breakout rooms (5 minutes) to discuss, and then report back to the group.
Suggestions:
Suggestions:
From the Carpentries lab…
From the Carpentries lab…
Let’s install some packages:
dplyr
Core dplyr
verbs:
select
pulls columnsfilter
pulls rows based on valuesmutate
adds or modifies a columngroup_by
+ summarize
calculates group-wise summary statistics*_join
functions combine data frames based on matching columns%>%
: older, included in dplyr
(ultimately depends on magrittr
)|>
: included in base R as of 4.1.0tidyr
Key tidyr
manipulations:
pivot_longer
turns columns names into row valuespivot_wider
creates new columns based on the values of a given field.Note
Data should be as long as is reasonable (but not longer)!
?tidyr::pivot_wider
and index pagesWork through the steps to synchronize the changes you’ve made today with GitHub.
Work through the steps to synchronize the changes you’ve made today with GitHub.
Command-line instructions
git add <your script name>
git commit
git push
Greg Wilson, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K. Teal: "Good Enough Practices for Scientific Computing". http://github.com/swcarpentry/good-enough-practices-in-scientific-computing/, 2016.
Wallace, E..W.J., Meynert, A., Zielinski. T., Romanowski. A., et. al., (2022). Good Enough Practices in Scientific Computing: A Lesson (Version 0.1.0). https://doi.org/tbc; also https://github.com/carpentries-lab/good-enough-practices/.