The Grammar of Graphics

ANU BDSI
workshop
Data Visualisation with R Part 1

Emi Tanaka

Biological Data Science Institute

10th April 2024

Current learning objective

  • -Create basic plots using ggplot2
  • Understand the concept of the grammar of graphics
  • -Construct plots with multiple layers in ggplot2
  • -Adjust scales and guides within ggplot2

Plotting

Plotting more than one plot

Plotting layer

Plotting small multiples

Plotting from a list of programs

  • One function One complete plot type
  • The number of plots that can be drawn
    The number of plot functions

Catalogue of plot types (not exhaustive)

The grammar of graphics

  • In linguistics, we combine finite number of words to construct vast number of sentences under a shared understanding of the grammar.
  • Wilkinson (2005) introduced “the grammar of graphics” as a paradigm to describe plots by combining a finite number of components.
  • Wickham (2010) interpreted the grammar of graphics into the ggplot2 R package (as part of his PhD project).
  • The grammar of graphics paradigm is also interpreted in other programming languages such as Python (e.g., plotnine) and Julia (e.g., Gadfly.jl, VegaLite.jl).

Basic structure of ggplot

  • data as data.frame
  • a set of aesthetic mappings between variables in the data and visual properties
  • at least one layer which describes what to render
  • the coordinate system

A layer in ggplot

  • A layer has five main components:
    • geom - the geometric object to use display the data
    • stat - statistical transformation to use on the data data
    • data to be displayed in this layer (usually inherited)
    • mapping - aesthetic mappings (usually inherited)
    • position - position adjustment

Deconstructing histogram

Deconstructing histogram

Deconstructing barplot

Deconstructing barplot

Layer data

Accessing layer data

  • Equivalent to the old syntaxes y = stat(density) and y = ..density..

Visualising amounts and proportions

BARPLOT SCATTER PLOT GROUPED BARPLOT STACKED BARPLOT HEATMAP

PIE CHART STACKED PERCENTAGE BARPLOT STACKED DENSITY PLOT

  • geom_bar()
  • geom_col()
  • geom_point()
  • geom_tile()
  • geom_density()

Position adjustments

A barplot with geom_bar()

  • If you have a categorical variable, then you usually want to study the frequency of its categories.
  • Here the stat = "count" is computing the frequencies for each category for you.
  • You can alternatively use stat_count() and change the geom.

Summary data

  • Sometimes your input data may already contain pre-computed counts.

A barplot with geom_col()

  • In this case, you don’t need stat = "count" to do the counting for you and use geom_col() instead.
  • This is essential a short hand for geom_bar(stat = "identity") where stat = "identity" means that you will take the value as supplied without any statistical transformation.

A stacked barplot with "stack"

A grouped barplot with "dodge"

  • "dodge" = position_dodge()

Another grouped barplot with "dodge2"

  • "dodge2" uses a different algorithm to recalculate the x-values with an option to add a padding between geometric objects

Stacked percentage barplot with "fill"

  • If you want to compare the percentages between the different x, then position = "fill" can be handy.

Coordinate systems

Pie or donut charts with coord_polar()

  • The default coordinate system is the Cartesian coordinate system.
  • But you can change this to a polar coordinate system like below.

Other coordinate systems

  • coord_cartesian() for Cartesian coordinate systems (default)
  • coord_flip() to flip the x and y
  • coord_fixed() to use a fixed aspect ratio
  • coord_equal() is essentially coord_fixed(ratio = 1)
  • coord_trans() to transform the coordinate after the statistical transformation
  • coord_map() to use projection based on mapproj

Summary

  • A layer has five main components:
    • geom - the geometric object to use display the data
    • stat - statistical transformation to use on the data data
    • data to be displayed in this layer (usually inherited)
    • mapping - aesthetic mappings (usually inherited)
    • position - position adjustment
  • Some position adjustments include: fill, stack, dodge, dodge2, and identity.
  • The coordinate system is by default the Cartesian coordinate (you will hardly change this).

Exercise time

30:00