Session 1
September 5, 2023
A (usually fiction) story:
You read a great paper and think “I bet I could apply their analysis methods to my work!” You click a DOI link in their Data Availability section that they definitely have. It opens to a web page where you can download a folder with R code, data, and documentation about the code and data. The page also has detailed information about how to cite the code and data. You open the folder in R Studio and you are prompted to install all the packages you need to run the code. You open the analysis code script and hit “run”. All the code runs perfectly with no errors, creating all the figures, tables, and statistics used in the paper. You scroll through the well-formatted R code and understand from the authors’ comments exactly what the code does and how to adapt it to your work.
We want to help you make this story a reality for someone else!
Workshop series website:
https://cct-datascience.github.io/repro-data-sci/
Learning Objectives:
Fresh start ensures reproducibility
Use Session > Restart R to check reproducibility
If long-running code is a concern, there are better solutions
Treat data as read-only
Use scripts to “clean” and wrangle data
Treat generated outputs as disposable
Put data, code, and outputs in different folders
setwd()
and getwd()
—use relative paths and RStudio projects insteadsource()
to run them if needed’Re-organize an existing project into a research compendium
OR
Apply a consistent coding style to one of your R scripts (e.g. with Code > Reformat Code or with the styler
package)