Sharing Code & Getting Credit

Session 10

October 3, 2024

Why share code?

Papers with open code are cited more

Getting credit for your code

Make it easier for people to cite your code

  • Provide a CITATION file on GitHub

  • Archive to get a digital object identifier (DOI)

  • Include DOI and/or citation in your paper’s Data/Code Availability Statement

CITATION.cff

  • A CITATION.cff file contains citation information written in YAML

  • Adding a CITATION.cff file to your repo…

    • Puts a “cite this repository” button on GitHub

    • Helps code archive tools fill out metadata correctly when you archive your repo

  • Learn more and create your own: https://citation-file-format.github.io/

  • See example here

Options for archiving

Service Versioned DOIs? Free? GitHub integration? Notes
Zenodo Yes Yes Yes Backed by CERN, built with code and data in mind
Dryad Yes No, but some publishers cover cost No Intended for data, not code. Partners with Zenodo
Figshare Yes Yes Yes Can’t choose your license
UA ReDATA Yes Yes (for UA researchers) No University of Arizona Libraries

Zenodo Archiving Demo

  1. Log in to sandbox.zenodo.org using GitHub1
  2. In drop-down menu with your username, select “GitHub”
  3. Find your repo in the list and flip the switch next to it
  4. Go to your repo on GitHub and make a release
  5. On sandbox.zenodo.org, get markdown to add a badge to README.md

When to archive?

No hard rules on this, but my preference:

  1. Just before submitting a manuscript: release v 0.1.0
  2. After responding to reviewers or re-submitting: increment “minor” version, e.g. v 0.2.0
  3. After acceptance: release v 1.0.0

Reproducible computational environments

Congrats! Your code is reproducible! But what about ….

  • in 3 years when an R package is updated with breaking changes?

  • on a different operating system with different versions of system libraries?

Capture the computational environment for ultimate reproducibility

Reproducible Environments With renv

  • The renv package records R packages and their versions used in your project

  • Projects are isolated with their own set of packages

  • Can restore exact versions of packages recorded

Using renv

Exercise

Install renv and activate it for a project with renv::init(). Inspect the files that were created.

If you change your mind …

To deactivate renv, run renv::deactivate(). To also remove all the files it created, run renv::deactivate(clean = TRUE) instead.

Limitations of renv

  • Only tracks R packages 1

  • Can’t reproduce operating system or system libraries

  • Sometimes quite annoying to use (but it’s getting better!)

More Reproducibility with Docker

Docker containers…

  • Are isolated “virtual machines”

  • Run Linux regardless of the host machine OS

  • Can be built with specific versions of OS, system libraries, and R packages (using renv)

  • Can be downloaded and run from the command line

Hold up, what is reproducibility again?

There is a reproducibility trade-off for using renv and Docker—robust computational reproducibility but harder for novices to reproduce

If you use these tools, provide:

  • Instructions on how to run code
  • Where to go for help troubleshooting
  • Ways to access your code without extra layers

Resources

Next week

  • Tuesday 10/18: Drop-in co-working session.

    • Come and work on your reproducibility colloquium project/presentation
  • Thursday 10/10: Reproducibility Colloquium!

    • Invite your lab-mates, PI, friends!

References

Maitner, Brian, Paul Santos-Andrade, Luna Lei, George Barbosa, Brad Boyle, Matiss Castorena, Brian Enquist, et al. 2023. “Code Sharing Increases Citations, but Remains Uncommon.” https://doi.org/10.21203/rs.3.rs-3222221/v1.