Towards better reproducible practice

ANU BDSI
workshop
Reproducible research with Quarto

Emi Tanaka

Biological Data Science Institute

12th April 2024

Current learning objective

  • -Generate HTML, PDF, or Word documents using Markdown syntax
  • -Understand the anatomy of Quarto documents
  • -Develop dynamic documents containing code (in R, Python, or Julia) using Quarto
  • -Implement various code chunk options to customize chunk behaviour
  • Recognize the significance of reproducibility and grasp the concept of literate programming
  • Apply reproducible practices
  • Establish an organized folder structure for data projects

Non-Robust Workflow

What should have been submitted:

A Robust, Reproducible Workflow

  • Using a robust, reproducible workflow means:
    • you avoid manual, repetitive tasks
    • your results are computationally reproducible
  • Using a robust, reproducible workflow doesn’t mean you won’t make mistakes, but it will help you minimise mistakes.
  • Literate programming is a programming paradigm introduced by Donald Knuth where it emphasises writing code for humans (i.e. intertwine code with natural language explanations).
  • Literate programming includes documentation (detailed explanations, comments and annotations to provide context, rationale and insight into the program’s design and functionality).

Organising and Sharing Your Data Projects

Statistical Value Chain

… a statistical value chain is constructed by defining a number of meaningful intermediate data products, for which a chosen set of quality attributes are well described …

— van der Loo & de Jonge (2018)]

Folder structure

A suggested folder structure for data projects:

    project-root-folder/  # Root of the project folder
    │
    ├── README.md         # README file
    │
    ├── data/             # Raw and derived data
    │   ├── data-raw/     # Read-only files
    │   ├── data-input/   # Extracted and coerced from raw data
    │   ├── data-valid/   # Edit and imputed from input data
    │   └── data-stats/   # Analysed results (R objects, .csv, etc.)
    │
    ├── analysis/         # Scripts (not functions) to run analysis
    │
    ├── figures/          # Figures (.png, .pdf, etc.)
    │
    ├── misc/             # Misc
    │
    ├── report.qmd        # Report, paper, or thesis output

Sharing your documents

via Quarto Pubs

  • Make sure you are logged in to your Quarto Pub account.
  • Then run the following command in the Terminal:
quarto publish quarto-pub /path/to/your/quarto-document.qmd


Self-contained HTML document

format:
  html:
    embed-resources: true
  • then you can share your output HTML file with no external dependencies

Happy writing and sharing!