Drawing plots with ggplot2

ANU BDSI
workshop
Data Visualisation with R Part 1

Emi Tanaka

Biological Data Science Institute

10th April 2024

Welcome 👋

Teaching team

  • Academic statistician passionate about data science and open source software
  • Currently, Deputy Director of ANU Biological Data Science Institute and Executive Editor of R Journal
  • PhD in Statistical Bioinformatics
  • BSci (Adv Maths) with major in Mathematics and Statistics
  • Loves data and coding

        https://emitanaka.org     @statsgen     fosstodon.org/@emitanaka

  • 2nd year of PhD student at ANU, working on phylogenomics methods to study hybridisation
  • BSci in Bioinformatics
  • Enjoy hiking and science
  • jeremiasivan
  • (Almost finishing) PhD student @Moritz Lab, E&E Division, RSB, ANU
  • Working on historical biogeography of Indo-Australian birds
  • BSc (Hons) with major in Zoology and Ecology & Conservation
  • Loves games, art markets, and FOOD
  • @goaudreymp
  • https://linktr.ee/goaudreymp
  • 3rd Year PhD student @ Linde Lab, E&E Division, RSB, ANU
  • Investigating the evolution of the Australian orchid flora and associated funga
  • MScSt (Biodiversity Science)
  • Loves playing music, reading, nature
  • @rpodonnell
  • rpodonnell.github.io
  • 2nd Year PhD student @ Sequeira Lab, E&E Division, RSB, ANU
  • Identifying social, collective, or coordinated movement behaviour patterns in sharks using tracking data
  • MSc (Marine Biology), BSc (Biology)
  • Loves triathlon, hiking and the ocean
  • @nilskreuter
  • https://linktr.ee/nilskreuter

Workshop materials

All materials will be hosted at
https://anu-bdsi.github.io/workshop-data-vis-R1/

🕙 Schedule

Time Content
10:00–10:30 Drawing plots with ggplot2
10:30–11:00 Exercise 1
11:00–11:30 The Grammar of Graphics
11:30–11:40 Break
11:40–12:10 Exercise 2
12:10–12:30 Drawing multiple layers with ggplot2
12:30–12:50 Exercise 3
12:50–13:00 Wrap up

Today’s learning objectives

  • Create basic plots using ggplot2
  • Understand the concept of the grammar of graphics
  • Construct plots with multiple layers in ggplot2
  • Adjust scales and guides within ggplot2

Current learning objective

  • Create basic plots using ggplot2
  • -Understand the concept of the grammar of graphics
  • -Construct plots with multiple layers in ggplot2
  • -Adjust scales and guides within ggplot2

Summary of R graphics

ggplot2 R package

  • ggplot2 R package is part of the tidyverse suite of R packages
  • ggplot2 is widely used by the scientific community and even by news outlets (e.g. Financial Times and BBC)

Basic structure of ggplot

  • data as data.frame
  • a set of aesthetic mappings between variables in the data and visual properties
  • at least one layer which describes what to render
  • the coordinate system (explained later)

Visualising distributions

HISTOGRAM DENSITY PLOT CUMULATIVE DENSITY PLOT Q-Q PLOT

BOXPLOTS VIOLIN PLOTS STRIP PLOTS STACKED HISTOGRAMS OVERLAPPING DENSITY PLOTS

  • geom_histogram()
  • geom_density()
  • stat_ecdf()
  • stat_qq()
  • geom_boxplot()
  • geom_violin()
  • geom_jitter()

Illustrative data Palmer penguins

penguins data is from the palmerpenguins 📦

A histogram with geom_histogram()

Other layers for univariate data

Available geom layers in ggplot2

Available stat layers in ggplot2

Visualising bivariate relationships

SCATTER PLOT SMOOTHED LINE PLOT HEATMAP OF 2D BIN COUNTS HEXAGONAL HEATMAP OF 2D BIN COUNTS

SLOPE GRAPH

  • geom_point()
  • geom_smooth()
  • geom_bin2d()
  • geom_hex()
  • geom_line()

Aesthetic specifications

vignette("ggplot2-specs")

  • Aesthetic arguments for each layer are found in documentation
    (e.g. ?geom_point).
  • Some common aesthetic specifications are:

x and y

x y

alpha

color

fill

size

Example: a scatterplot with geom_point()

  • Notice that legends are automatically made for aesthetics

Aesthetic specification for points

shape

circle circle open circle filled circle cross circle plus circle small bullet square square open square filled square cross square plus square triangle diamond diamond open diamond filled diamond plus triangle triangle open triangle filled triangle square triangle down open triangle down filled plus cross asterisk

stroke vs size

0 2 4 6 0 2 4 6 size stroke

  • The default shape is “circle”.
  • stroke and fill is only for the “filled” shapes.

Aesthetic specifications for lines

color

linetype

0 = blank 1 = solid 2 = dashed 3 = dotted 4 = dotdash 5 = longdash 6 = twodash

linewidth

linewidth = 1 linewidth = 2 linewidth = 3 linewidth = 4 linewidth = 5 linewidth = 6

lineend

butt (default) round square

linejoin

round (default) mitre bevel

Aesthetic or Attribute?

  • When you supply values within aes, it assumes that it’s a data variable.
  • The string "dodgerblue" gets converted into a variable with one level and it gets colored by ggplot’s default color palette.

When your input is an attribute

Don’t put attributes inside aes()!

Bonus tip: “as-is” operator

  • Use I() operator to mean “as-is” in aesthetic mapping.

Attributes are for layers

  • Attributes should be defined in specific layers.
  • Notice how the points don’t have the “dodgeblue” color.
  • Layers inherit data and the mapping from ggplot() but not attributes.

Summary

  • data as data.frame
  • a set of aesthetic mappings between variables in the data and visual properties
  • at least one layer (usually geom_ or stat_ functions) which describes what to render
  • the coordinate system (explained later)

ggplot2 cheatsheet

Exercise time

30:00