Data Wrangling with R Part 1

BDSI Workshop
Published

April 8, 2024

👋 Welcome

This is a training workshop for staff and students affiliated with the Australian National University (ANU) offered by the ANU Biological Data Science Institute (BDSI).

The workshop aims to teach data wrangling with R using the Tidyverse paradigm. Tidyverse is a popular collection of open source packages that share an underlying design philosophy, grammar and data structure. This workshop is tailored for beginners in the Tidyverse, specifically those with minimal or no exposure to the dplyr and tidyr packages. A basic understanding of R is assumed. Individuals with no prior R knowledge are advised to attend the Introduction to R Programming workshop prior to this workshop. Please do not enrol in this workshop if you have no R knowledge at the start of this workshop.

🎯 Learning objectives

Upon completion of the workshop, participants should be able to

  • Recognize the characteristics of tidy data
  • Differentiate between the Base and Tidyverse paradigms
  • Acquire the skills to add/modify columns, subset data by rows and columns, rename column names, and perform group operations using dplyr
  • Pivot data into longer or wider format using tidyr
  • Join datasets using dplyr

🔧 Preperation

Please ensure that you download and install

  • the latest version of R,
  • the latest version of RStudio Desktop,
  • (Optional) Slack (alternatively you can use the web version), and
  • the following packages by opening RStudio Desktop, then copy and paste the command below in the Console section, pushing Enter after pasting.
install.packages(c("tidyverse", "agridat", "medicaldata"))

If you are having issues, see also here or talk to the teaching team.

Slack workspace (Optional)

We use Slack to facilitate communication between workshop participants and the teaching team.

  • You must use your ANU email to sign up.
  • Use your full name.
  • By joining, you agree to abide by this code of conduct.
  • Please don’t direct message the teaching team in Slack. Your questions are more likely to be answered in the Slack channels rather than in direct messages.

Please note that the teaching team does not necessarily actively monitor or attend to the Slack workspace outside of the workshop.

Teaching team

  • Academic statistician passionate about data science and open source software
  • Currently, Deputy Director of ANU Biological Data Science Institute and Executive Editor of R Journal
  • PhD in Statistical Bioinformatics
  • BSci (Adv Maths) with major in Mathematics and Statistics
  • Loves data and coding

        https://emitanaka.org     @statsgen     fosstodon.org/@emitanaka

  • 2nd year of PhD student at ANU, working on phylogenomics methods to study hybridisation
  • BSci in Bioinformatics
  • Enjoy hiking and science
  • jeremiasivan
  • 3rd Year PhD student @ Linde Lab, E&E Division, RSB, ANU
  • Investigating the evolution of the Australian orchid flora and associated funga
  • MScSt (Biodiversity Science)
  • Loves playing music, reading, nature
  • @rpodonnell
  • rpodonnell.github.io
  • 2nd Year PhD student @ Sequeira Lab, E&E Division, RSB, ANU
  • Identifying social, collective, or coordinated movement behaviour patterns in sharks using tracking data
  • MSc (Marine Biology), BSc (Biology)
  • Loves triathlon, hiking and the ocean
  • @nilskreuter
  • https://linktr.ee/nilskreuter

Materials

The materials can be found here.

These materials are shared under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

This website is brought to you by the ANU Biological Data Science Institute.