Reading and writing data

ANU BDSI
workshop
Introduction to R programming

Emi Tanaka

Biological Data Science Institute

3rd April 2024

Current learning objective

  • -Conduct elementary arithmetic operations using R
  • -Navigate the RStudio interactive development environment (IDE)
  • -Install external packages in R to extend functionality
  • -Comprehend various object types in R
  • -Manipulate lists, matrices, and vectors in R
  • -Compute basic summary statistics including mean, median, quartiles, and standard deviation using R
  • -Grasp the concept of missing values within the R environment
  • Import and export data in R
  • -Create basic functions, employ conditional statements, and utilize for loops in R
  • -Decipher error messages and do basic troubleshooting

Data file formats

  • Data are stored as a file, which points to a block of computer memory.
  • A file format signals a way to interpret the information stored in the computer memory.
  • A file with the extension “csv” (comma-separated values) uses a comma as a delimiter while “tsv” uses tabs as a delimiter.
data.csv
len, supp, dose
4.2, VC, 0.5
11.5, VC, 0.5
...

Reading and Writing CSV files

File paths

  • Your file has to be in the right location to be read!
  • You may use a relative path (e.g. data/data.csv) or an absolute path (e.g. C:\\user/myproject/data.csv) to point to the right location of the data
  • You should avoid using absolute path! Why?
  • You can get and set the current path using getwd() and setwd(), respectively.

Folder structure

  • Your folder structure depends on the project, but it is generally a good idea to have a folder on its own for each project.
  • Within the project, it is also good to have a separate folder for:
    • data
    • script/analysis
    • report/paper
    • figures/images.

R project

  • Within RStudio, you can create a project file (with an .Rproj extension).
  • Double clicking on this project file launches RStudio Desktop with the current working directory set to the location of the project file.
  • You can create this project file by going to RStudio > File > New Project …

Binary formats

  • Data can also be stored as a binary format (e.g. .RData, .rda or rds).
  • .RData, .rda or rds saves R objects so you don’t need the data to be in a data.frame.

Reading Excel sheets

  • Data can also come in a propriety format (e.g. xls and xlsx) – these require special ways to open/view/read it.

Importing through the GUI

In RStudio Desktop, you can click on the file for importing via GUI.


Formatting data

  • Unless you are responsible for entering the data, you should never modify the original, stored data (note: exceptions do apply).
  • For scientific integrity, any modification to the original data should be recorded in a reproducible manner (e.g. by programming in R!) so that you can trace the exact modifications.

Summary

  • You can use readr::read_csv() to read CSV files.
  • You can use readxl::read_xlsx() to read Excel files.
  • Save a single R object using saveRDS() (recommended) and multiple objects using save().
  • In RStudio Desktop, you can click on the file for importing via GUI.
  • Set up R Projects and use relative path to data files.

Data Import Cheatsheet

Data Import Cheatsheet

Exercise time

10:00