Design and Analysis of Experiments

BDSI Workshop
Published

September 17, 2024

👋 Welcome

This is a training workshop for BIOL8001 students but extended to also staff and students affiliated with the Australian National University (ANU) offered by the ANU Biological Data Science Institute (BDSI).

This workshop provides a comprehensive introduction to the principles of experimental design and the analysis of experimental data. Participants are expected to have a basic understanding of R and linear models prior to the start of the workshop.

🎯 Learning objectives

Upon completion of this workshop, participants should be able to:

  • Comprehend the differences between experimental and observational data
  • Demonstrate proficiency in designing experiments, including defining research questions, selecting appropriate treatments or factors, and identifying potential sources of variation
  • Understand the principles of experimental design, including randomization, control, replication, and blocking
  • Understand the fundamental concepts of causal inference for experimental data
  • Formulate a statistical analysis plan for the given experimental design

🔧 Preparation

Please ensure that you download and install

  • the latest version of R,
  • the latest version of RStudio Desktop or Positron, and
  • the following packages by opening RStudio Desktop or Positron, then copy and paste the command below in the Console section, pushing Enter after pasting.
install.packages(c("edibble", "broom"))
  • For Window users, you may need to install Rtools to install R packages.

📚 Slides

image/svg+xml

Click on the heading to open the slides in a new tab, or click on details to see the slides on this page.

Statistical anatomy of experiments

Self test with Exercise 1

Principles of experimental designs

Self test with Exercise 2

Analysis of experimental data

Self test with Exercise 3

📑 Resources

  • Ruxton & Colegrave (2006) Experimental design for the life sciences. 2nd Edition. (This book has a number of practical advices.)
  • Glass (2007) Experimental design for biologists. 1st Edition. (This book is more philosophical.)
  • Chapters 1, 3, 4 and 7 from Welham et al. (2015) Statistical Methods in Biology: Design and Analysis of Experiments and Regression. See http://www.stats4biol.info/ for the data and code in the book. (This book is more statistical.)

🏋️‍♀️ Self-paced exercises

The following self-paced exercises are designed to check your understanding of the material. After completing each exercise, review your answers and assess your grasp of the concepts.

Reflect on learning objectives
You should be able to:
  • Comprehend the differences between experimental and observational data
  • Demonstrate proficiency in designing experiments, including defining research questions, selecting appropriate treatments or factors, and identifying potential sources of variation

For the following studies,

  1. identify the aim of the study,
  2. classify each as an experimental study or an observational study,
  3. identify the treatments or exposures being studied,
  4. identify the experimental units or subjects being observed,
  5. identify the response variable(s) being measured,
  6. identify the observational units being studied,
  7. draw a causal diagram.

For (a)-(f), you will need to either select one or more options from a list or fill in the blank. Some questions will have multiple correct answers. For (g), a sample diagram is provided but by no means is it the only correct answer.

For each study, think about whether the study objective can be better framed and whether the treatments/exposures and response variables indeed address the objective.

Study 1

Researchers are studying the effect of a new fertilizer on plant growth. They randomly select 50 plants from the same species and divide them into two groups. One group receives the new fertilizer, while the other group receives a standard fertilizer. Over the course of two months, the researchers measure the height and number of leaves of the plants in both groups to assess the impact of the new fertilizer.

  1. Select the sentence that best describes the aim of the study.
  2. What type of study is this?
  3. What are the treatments or exposures in this study?
  4. What are the experimental units or subjects in this study?
  5. What are the response variables in this study?
  6. What are the observational units in this study?
  7. Draw a causal diagram for this study to brainstorm the potential sources that could affect the outcome.
Sample diagram

flowchart LR
    P0[Plant source] --> P[50 Plants]
    subgraph Treatments
      direction TB
      F[New fertilizer] 
      S[Standard fertilizer] 
    end 
    Treatments --> P
    subgraph Responses
    H[Plant height]
    L[Number of leaves]
    end
    P --> H
    P --> L
    Environment[Environment, e.g. sunlight, water, CO2 etc] --> P
    M[Measurement device] --> H
    Person[Person] --> M
    style Treatments fill:#fff,stroke:#333,stroke-width:4px
    style Responses fill:#fff,stroke:#333,stroke-width:4px

Study 2

A group of ecologists wants to study the relationship between sunlight exposure and tree growth in a forest. They observe and measure the height of 100 trees growing in different areas of the forest, noting how much sunlight each tree receives naturally based on its location. They collect data on tree growth and sunlight exposure to analyze any correlations between the two variables.

  1. Select the sentence that best describes the aim of the study.
  2. What type of study is this?
  3. What are the treatments or exposures in this study?
  4. What are the experimental units or subjects in this study?
  5. What are the response variables in this study?
  6. What are the observational units in this study?
  7. Draw a causal diagram for this study to brainstorm the potential sources that could affect the outcome.
Sample diagram

flowchart LR
    T0[Area] --> T1[100 Trees]
    subgraph Exposures
      direction TB
      S[Sunlight exposure] 
    end 
    Exposures --> T1
    subgraph Responses
    H[Tree height]
    end
    T1 --> H
    Management --> T1
    Environment[Other environmental effects, e.g. water, CO2 etc] --> T1
    M[Measurement device] --> H
    Person[Person] --> M
    style Exposures fill:#fff,stroke:#333,stroke-width:4px
    style Responses fill:#fff,stroke:#333,stroke-width:4px

Study 3

A group of epidemiologists is investigating the potential link between physical activity levels and heart disease. They collect data from 1,000 adults by surveying them about their weekly exercise habits and then tracking their health outcomes over the next 10 years. They record the participants physical activity levels and whether or not individuals develop heart disease.

  1. Select the sentence that best describes the aim of the study.
  2. What type of study is this?
  3. What are the treatments or exposures in this study?
  4. What are the experimental units or subjects in this study?
  5. What are the response variables in this study?
  6. What are the observational units in this study?
  7. Draw a causal diagram for this study to brainstorm the potential sources that could affect the outcome.
Sample diagram

flowchart LR
    P0[Adult source] --> P1[1000 Adults]
    subgraph Exposures
      direction TB
      P[Physical activity levels] 
    end 
    Exposures --> P1
    subgraph Responses
    H[Heart disease]
    end
    P1 --> H
    L[Lifestyle] --> P1
    Environment --> P1
    Genetics --> P1
    Age --> P1
    Diet --> P1
    style Exposures fill:#fff,stroke:#333,stroke-width:4px
    style Responses fill:#fff,stroke:#333,stroke-width:4px

Study 4

A medical researcher is testing the effectiveness of a new drug for treating high blood pressure. They recruit 200 patients with hypertension and randomly assign half to receive the new drug, while the other half receive a placebo. After three months, the researchers measure the blood pressure of all participants to determine whether the new drug significantly lowers blood pressure compared to the placebo.

  1. Select the sentence that best describes the aim of the study.
  2. What type of study is this?
  3. What are the treatments or exposures in this study?
  4. What are the experimental units or subjects in this study?
  5. What are the response variables in this study?
  6. What are the observational units in this study?
  7. Draw a causal diagram for this study to brainstorm the potential sources that could affect the outcome.
Sample diagram

flowchart LR
    P1[200 Patients]
    subgraph Treatments
      direction TB
      D[Drug] 
      P[Placebo]
    end 
    Treatments --> P1
    subgraph Responses
    B[Blood pressure]
    end
    P1 --> B
    L[Lifestyle] --> P1
    Environment --> P1
    Genetics --> P1
    Age --> P1
    Diet --> P1
    style Treatments fill:#fff,stroke:#333,stroke-width:4px
    style Responses fill:#fff,stroke:#333,stroke-width:4px

Study 5

A biologist is studying the effect of different water temperatures on fish growth. They select 100 fish and randomly assign them to two tanks: one with water kept at 15°C and the other at 25°C. Over the course of a month, the researcher measures the weight gain of each fish in both tanks to assess how water temperature affects growth.

  1. Select the sentence that best describes the aim of the study.
  2. What type of study is this?
  3. What are the treatments or exposures in this study?
  4. What are the experimental units or subjects in this study?
  5. What are the response variables in this study?
  6. What are the observational units in this study?
  7. Draw a causal diagram for this study to brainstorm the potential sources that could affect the outcome.
Sample diagram

flowchart LR
    P1[100 Fish]
    subgraph Treatments
      direction TB
      W15[Water 15 Celsius degrees] 
      W25[Water 25 Celsius degrees]
    end 
    Treatments --> Tank
    subgraph Responses
    WG[Weight gain]
    end
    P1 --> WG
    Environment --> Tank
    F[Fish source] --> P1
    Tank --> P1
    Diet --> Tank
    style Treatments fill:#fff,stroke:#333,stroke-width:4px
    style Responses fill:#fff,stroke:#333,stroke-width:4px

Reflect on learning objectives
You should be able to:
  • Understand the principles of experimental design, including randomization, control, replication, and blocking
  • Understand the fundamental concepts of causal inference for experimental data

In this exercise, researchers are studying the effect of a new fertilizer compared to no fertilizer on plant growth. They randomly select 80 plants from the same species and assign the two treatments to 40 plants each. The researchers measure the height and number of leaves of the plants after two months to assess the impact of the new fertilizer.

Study A

We can specify the above design using edibble package in R as below. What do you notice about the treatment allocation? Could this be problematic? Why?

Study B

You’ve been told that the plants are grouped into 4 blocks where each block contains 20 pots. One plant is grown in each pot. The researchers want to make sure that each block has an equal number of plants from each treatment group. They also want to randomize the treatment allocation within each block. How would you design this experiment? Using tidyverse or otherwise, count how many plants are assigned to each treatment group within each block after you have your design layout.

Sample solution

Study C

You’ve now been told that the researchers want an additional temperature (standard or high) treatment factor. But they also tell you that they can’t change the temperature for each pot. Instead, they can only change the temperature for each block. How would you design this experiment? Using tidyverse or otherwise, get the treatment replications.

Sample solution
Reflect on learning objectives
You should be able to:
  • Formulate a statistical analysis plan for the given experimental design

In this exercise, consider Study C from Exercise 2. This time though consider what your statistical analysis plan may be. Simulate some response data and perform the actual analysis from your plan on this simulated data.

Sample solution

In this sample solution, we are going to use the edibble R package to design the experiment and simulate some response data.

  • We’ll analyse each response variable separately using ANOVA.
  • We can actually peak the “truth” by looking at the simulation process using examine_process(). The line that starts with y <- shows how the response was generated. Every response has the so-called plant effects, but check what other effects was included. Does it match up with what is statistical significant from your ANOVA
  • While you have to becareful to build models with proper context and perform model diagnostic, you can check the suggested baseline model using design_model().

This website is brought to you by the ANU Biological Data Science Institute.