Reproducibility & Data Science in R

Learning Objectives

Understand the benefits of organizing a project as a research compendium
Use RStudio projects to create self-contained reproducible projects
Use best practices for organizing files in a project
Create a “toy” research compendium you can use throughout this series

A research compendium is a collection of all digital parts of a research project including data, code, texts (protocols, reports, questionnaires, meta data). The collection is created in such a way that reproducing all results is straightforward.

— The Turing Way

(“project” could be a single manuscript or multiple manuscripts that use the same data or code—that’s up to you to decide)

Research Compendium Best Practices

Treat raw data as read-only
Use scripts to “clean” and wrangle data
Treat generated outputs as disposable
Put data, code, and outputs in different folders

What goes in a compendium?

Ideally, everything related to that project

Data
Code
Lab notebook
Notes about analyses

Outputs
Documentation / metadata
Reports / presentations
Manuscripts

Basic compendium structure

Short machine and human readable name
Separate folders for “raw” and “clean” data
gdd-thresholds.Rproj file is created by RStudio
R/ folder contains all code to reproduce analysis. Could be named scripts/ or something else
R scripts are numbered with two digits so alphabetic sorting = numeric sorting
README.md is a markdown (plain text) document (we’ll get to README’s later)

RStudio projects

Never worry about setwd() or getwd() again! Your compendium is always your working directory.
Switch between many active projects. RStudio remembers where you left off.

Tip

Don’t worry if you don’t know what a working directory is, Renata will talk more about it next week!

DEMO: create a new RStudio Project

Settings for Success

image of RStudio global settings pane with the "Workspace" and "History" sections highlighted. None of the boxes are checked and the dropdown for "Save workspace to .RData on exit" is set to "never"

Fresh start ensures reproducibility
If your analysis relies on saving your environment in .RData, there are better solutions

Build your (toy) research compendium

Using the RStudio file pane OR Finder (macOS) / File Explorer (Windows)

Create folders for data/, R/, output/ and notes/
Download the gapminder dataset and place it in your data/ folder
Create a text file called README.md

README

A README is a plain text document in your research compendium that contains:

Brief project summary
Project status (e.g. work-in-progress, published)
Who is involved
If re-use is allowed and how to give credit
Structure of repo (which files do what?)
Instructions on how to reproduce results

Example READMEs for research compendia

Note

We will revisit how to make a README formatted for GitHub in week 3

Reproducibility & Data Science in R

What is Reproducibility & Why?

The Whole Picture

Syllabus & Workshop Materials

Reproducibility Colloquium

Screen Setup

Creating a Research Compendium

Learning Objectives

Research Compendium Best Practices

What goes in a compendium?

Basic compendium structure

RStudio projects

Settings for Success

Build your (toy) research compendium

README

Example READMEs for research compendia

Getting Help

Getting Help

A note on AI

A note on AI

Takeaways

Your Tasks

References