Reproducible manuscripts with Git

R
quarto
R-SIG
Git
R-SIG 09.09.2024
Published

September 9, 2024

1

Motivation

We have already talked about reproducible manuscripts with Quarto. Now, one big plus of writing in a markdown language is that it is very easy to use with a version control systems like Git, enabling us to leverage the many advantages of version control.

Git

Git is a version control system. Some advantages:

  • History: You can see the history of your project, who did what and when. Changes in the project can be easily tracked.
  • Collaboration: You can easily work together with others on the same project.
  • Backup: Your project lies online, so you don’t have to worry about backing it up.

Mainly it is used for working on code. However, markdown files are also text files, and can therefore be easily version controlled with Git. For this text I assume you are already kind of proficient in working with GitHub. If not, you can take a look at this Getting Started Guide.

Tip

To make your project truly reproducible, you might want to also use renv.
If you want to dive even deeper into reproducible workflows, take a look at Peikert and Brandmaier (2021).

Quarto + Git

Generally, there isn’t much new stuff here, if you already work with GitHub. You set up your repo and track your R and Quarto files with Git. In light of reproducability, this is as transparent as we can get. If we use GitHub throughout the whole project, and make the project public, everyone can track what we have done, which decisions we made and why.
We can use Issues to discuss certain points with coauthors and can use Pull Requests and Reviews to discuss changes in the manuscript or analysis.

GitHub Actions

It is considered bad practicew to commit rendered documents like PDF or HTML to GitHub. Instead, build them with GitHub Actions. This way, it is always clear what the current version is, and how your code relates to the built output document. GitHub Actions are a way to automate your workflow. You can set up a workflow that runs every time you push to your repository. This can be used to check your code, run tests, or even build your manuscript. The setup is a bit more complex, the complete documentation can be found here.
In this section, I’ll present one possible workflow.

Warning

Even if your repository is private, publishing a document like shown in this workflow will make it public, so in theory everyone can see it.

1. _freeze

First, you have to open your _quarto.yml file and add the following:

execute:
  freeze: auto

This will create a _freeze folder containing all your executed code. If you youse renv, this isn’t necessary, because the renv workflow will execute the code again.

2. render

Now you can render your quarto project once, using the Terminal (not the Console):

quarto render

Commit and push your changes!

Don’t commit your output file, like html. You can exclude it from appearing in your git-interface by adding *.html to your .gitignore file.

3. gh-pages branch

After that, you have to set up a gh-pages branch (make sure you have commited all changes before building the branch), again in the Terminal:

git checkout --orphan gh-pages
git reset --hard # make sure all changes are committed before running this!
git commit --allow-empty -m "Initialising gh-pages branch"
git push origin gh-pages

Your published content will be build from this branch. You don’t have to touch it after setting it up, the Actions we’ll build will take care of that.

4. publish

Finally, you can publish your quarto document:

quarto publish gh-pages documentname.qmd

5. Action

To trigger this publishing everytime you push to your main branch on GitHub, build a new directory in your project called .github/workflows. Into this directory, you put a file publish.yml and fill it with the following code:

on:
  workflow_dispatch:
  push:
    branches: main

name: Quarto Publish

jobs:
  build-deploy:
    runs-on: ubuntu-latest
    permissions:
      contents: write
    steps:
      - name: Check out repository
        uses: actions/checkout@v4

      - uses: r-lib/actions/setup-r@v2
      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          packages:
            any::rmarkdown
            any::knitr
      - name: Set up Quarto
        uses: quarto-dev/quarto-actions/setup@v2

      - name: Render and Publish
        uses: quarto-dev/quarto-actions/publish@v2
        with:
          target: gh-pages
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Important

You need to check the Read and write permissions box under Workflow permissions in the Actions section of your repository Settings.

Note

I actually prefer the renv workflow, but you have to setup renv to use it.

Caveats

Some words of warning: Everything is online. So you should be carefull to upload sensitive data. Also, the fact that the whole process would be visible to everyone might feel weird. Still, even if you leave the Repo on private, it still is a great thing!

Exercises

  1. Set up a GitHub repository for the quarto project you worked on in the last sessions. If you don’t upload your stuff to a cloud.
  2. Make up some small Issue that you can write into the Issue section on GitHub.
  3. Fix this Issue on a new branch. Commmit the changes, using closes #Issuenumber in the commit message, push everything and open a pull request.
  4. Assign someone from the group as reviewer.
  5. Review a pull-request assigned to you.
  6. Setup a actions workflow that automatically renders your document.

References

Peikert, Aaron, and Andreas M Brandmaier. 2021. “A Reproducible Data Analysis Workflow with r Markdown, Git, Make, and Docker.” Quantitative and Computational Methods in Behavioral Sciences, 1–27.

Footnotes

  1. Image by Towfiqu barbhuiya on Unsplash.↩︎