Sharing Code & Getting Credit

Session 10

October 5, 2023

Why share code?

Papers with open code are cited more

Getting credit for your code

Make it easier for people to cite your code

  • Provide a CITATION file on GitHub

  • Archive to get a digital object identifier (DOI)

  • Include DOI and/or citation in your paper’s Data/Code Availability Statement

CITATION.cff

  • Citation File Format are plain text files written in YAML

  • Adding a CITATION.cff file to your repo…

    • Puts a “cite this repository” button on GitHub

    • Helps code archive tools fill out metadata correctly when you archive your repo

  • Create a CITATION.cff file with this helper

  • See example here

Options for archiving

Service Versioned DOIs? Free? GitHub integration? Notes
Zenodo Yes Yes Yes Backed by CERN, built with code and data in mind
Dryad Yes No, but some publishers cover cost No Intended for data, not code. Partners with Zenodo
Figshare Yes Yes Yes Can’t choose your license
UA ReDATA Yes Yes (for UA researchers) No University of Arizona Libraries

Zenodo Archiving Demo

Reproducible computational environments

Congrats! Your code is reproducible! But what about ….

  • in 3 years when an R package is updated with breaking changes?

  • on a different operating system with different versions of system libraries?

Capture the computational environment for ultimate reproducibility

Reproducible Environments With renv

  • The renv package records R packages and their versions used in your project

  • Projects are isolated with their own set of packages

  • Can restore exact versions of packages recorded

Using renv

Exercise

Install renv and activate it for a project with renv::init(). Inspect the files that were created.

If you change your mind ...

To deactivate renv, run renv::deactivate(). To also remove all the files it created, run renv::deactivate(clean = TRUE) instead.

Limitations of renv

  • Only tracks R packages 1

  • Can’t reproduce operating system or system libraries

  • Sometimes quite annoying to use (but it’s getting better!)

Reproducible Everything with Docker

Docker containers…

  • Are isolated “virtual machines”

  • Run Linux regardless of the host machine OS

  • Can be built with specific versions of OS, system libraries, and R packages (using renv)

  • Can be downloaded and run from the command line

Making a Docker Container

A Dockerfile holds instructions on what to install and what code to run. Actually creating a Docker container is beyond the scope of this workshop, but you can learn how!

# Base image https://hub.docker.com/u/rocker/
FROM rocker/r-base:latest

## create directories
RUN mkdir -p /01_data
RUN mkdir -p /02_code
RUN mkdir -p /03_output

## copy files
COPY /02_code/install_packages.R /02_code/install_packages.R
COPY /02_code/myScript.R /02_code/myScript.R

## install R-packages
RUN Rscript /02_code/install_packages.R

Hold up, what is reproducibility again?

There is a reproducibility tradeoff for using renv and Docker—robust computational reproducibility but harder for novices to reproduce

If you use these tools, provide:

  • Instructions on how to run code
  • Where to go for help troubleshooting
  • Ways to access your code without extra layers

Drop-in Session & Showcase

  • Next week (10/10, 10/12): No workshops!

  • Tuesday 10/17: Drop-in help session

  • Tuesday 10/24: Reproducibility show & tell

References

Maitner, Brian, Paul Santos-Andrade, Luna Lei, George Barbosa, Brad Boyle, Matiss Castorena, Brian Enquist, et al. 2023. “Code Sharing Increases Citations, but Remains Uncommon.” https://doi.org/10.21203/rs.3.rs-3222221/v1.