Design and Analysis of Experiments
👋 Welcome
This is a training workshop for BIOL8001 students but extended to also staff and students affiliated with the Australian National University (ANU) offered by the ANU Biological Data Science Institute (BDSI).
This workshop provides a comprehensive introduction to the principles of experimental design and the analysis of experimental data. Participants are expected to have a basic understanding of R and linear models prior to the start of the workshop.
🎯 Learning objectives
Upon completion of this workshop, participants should be able to:
- Comprehend the differences between experimental and observational data
- Demonstrate proficiency in designing experiments, including defining research questions, selecting appropriate treatments or factors, and identifying potential sources of variation
- Understand the principles of experimental design, including randomization, control, replication, and blocking
- Understand the fundamental concepts of causal inference for experimental data
- Formulate a statistical analysis plan for the given experimental design
🔧 Preparation
Please ensure that you download and install
- the latest version of R,
- the latest version of RStudio Desktop or Positron, and
- the following packages by opening RStudio Desktop or Positron, then copy and paste the command below in the Console section, pushing Enter after pasting.
install.packages(c("edibble", "broom"))
- For Window users, you may need to install Rtools to install R packages.
📚 Slides
Click on the heading to open the slides in a new tab, or click on details to see the slides on this page.
Statistical anatomy of experiments
Principles of experimental designs
Analysis of experimental data
📑 Resources
- Ruxton & Colegrave (2006) Experimental design for the life sciences. 2nd Edition. (This book has a number of practical advices.)
- Glass (2007) Experimental design for biologists. 1st Edition. (This book is more philosophical.)
- Chapters 1, 3, 4 and 7 from Welham et al. (2015) Statistical Methods in Biology: Design and Analysis of Experiments and Regression. See http://www.stats4biol.info/ for the data and code in the book. (This book is more statistical.)
🏋️♀️ Self-paced exercises
The following self-paced exercises are designed to check your understanding of the material. After completing each exercise, review your answers and assess your grasp of the concepts.
- Comprehend the differences between experimental and observational data
- Demonstrate proficiency in designing experiments, including defining research questions, selecting appropriate treatments or factors, and identifying potential sources of variation
For the following studies,
- identify the aim of the study,
- classify each as an experimental study or an observational study,
- identify the treatments or exposures being studied,
- identify the experimental units or subjects being observed,
- identify the response variable(s) being measured,
- identify the observational units being studied,
- draw a causal diagram.
For (a)-(f), you will need to either select one or more options from a list or fill in the blank. Some questions will have multiple correct answers. For (g), a sample diagram is provided but by no means is it the only correct answer.
For each study, think about whether the study objective can be better framed and whether the treatments/exposures and response variables indeed address the objective.
Study 1
Researchers are studying the effect of a new fertilizer on plant growth. They randomly select 50 plants from the same species and divide them into two groups. One group receives the new fertilizer, while the other group receives a standard fertilizer. Over the course of two months, the researchers measure the height and number of leaves of the plants in both groups to assess the impact of the new fertilizer.
- Select the sentence that best describes the aim of the study.
- What type of study is this?
- What are the treatments or exposures in this study?
- What are the experimental units or subjects in this study?
- What are the response variables in this study?
- What are the observational units in this study?
- Draw a causal diagram for this study to brainstorm the potential sources that could affect the outcome.
Sample diagram
Study 2
A group of ecologists wants to study the relationship between sunlight exposure and tree growth in a forest. They observe and measure the height of 100 trees growing in different areas of the forest, noting how much sunlight each tree receives naturally based on its location. They collect data on tree growth and sunlight exposure to analyze any correlations between the two variables.
- Select the sentence that best describes the aim of the study.
- What type of study is this?
- What are the treatments or exposures in this study?
- What are the experimental units or subjects in this study?
- What are the response variables in this study?
- What are the observational units in this study?
- Draw a causal diagram for this study to brainstorm the potential sources that could affect the outcome.
Sample diagram
Study 3
A group of epidemiologists is investigating the potential link between physical activity levels and heart disease. They collect data from 1,000 adults by surveying them about their weekly exercise habits and then tracking their health outcomes over the next 10 years. They record the participants physical activity levels and whether or not individuals develop heart disease.
- Select the sentence that best describes the aim of the study.
- What type of study is this?
- What are the treatments or exposures in this study?
- What are the experimental units or subjects in this study?
- What are the response variables in this study?
- What are the observational units in this study?
- Draw a causal diagram for this study to brainstorm the potential sources that could affect the outcome.
Sample diagram
Study 4
A medical researcher is testing the effectiveness of a new drug for treating high blood pressure. They recruit 200 patients with hypertension and randomly assign half to receive the new drug, while the other half receive a placebo. After three months, the researchers measure the blood pressure of all participants to determine whether the new drug significantly lowers blood pressure compared to the placebo.
- Select the sentence that best describes the aim of the study.
- What type of study is this?
- What are the treatments or exposures in this study?
- What are the experimental units or subjects in this study?
- What are the response variables in this study?
- What are the observational units in this study?
- Draw a causal diagram for this study to brainstorm the potential sources that could affect the outcome.
Sample diagram
Study 5
A biologist is studying the effect of different water temperatures on fish growth. They select 100 fish and randomly assign them to two tanks: one with water kept at 15°C and the other at 25°C. Over the course of a month, the researcher measures the weight gain of each fish in both tanks to assess how water temperature affects growth.
- Select the sentence that best describes the aim of the study.
- What type of study is this?
- What are the treatments or exposures in this study?
- What are the experimental units or subjects in this study?
- What are the response variables in this study?
- What are the observational units in this study?
- Draw a causal diagram for this study to brainstorm the potential sources that could affect the outcome.
Sample diagram
- Understand the principles of experimental design, including randomization, control, replication, and blocking
- Understand the fundamental concepts of causal inference for experimental data
In this exercise, researchers are studying the effect of a new fertilizer compared to no fertilizer on plant growth. They randomly select 80 plants from the same species and assign the two treatments to 40 plants each. The researchers measure the height and number of leaves of the plants after two months to assess the impact of the new fertilizer.
Study A
We can specify the above design using edibble
package in R as below. What do you notice about the treatment allocation? Could this be problematic? Why?
Study B
You’ve been told that the plants are grouped into 4 blocks where each block contains 20 pots. One plant is grown in each pot. The researchers want to make sure that each block has an equal number of plants from each treatment group. They also want to randomize the treatment allocation within each block. How would you design this experiment? Using tidyverse
or otherwise, count how many plants are assigned to each treatment group within each block after you have your design layout.
Sample solution
Study C
You’ve now been told that the researchers want an additional temperature (standard or high) treatment factor. But they also tell you that they can’t change the temperature for each pot. Instead, they can only change the temperature for each block. How would you design this experiment? Using tidyverse
or otherwise, get the treatment replications.
Sample solution
- Formulate a statistical analysis plan for the given experimental design
In this exercise, consider Study C from Exercise 2. This time though consider what your statistical analysis plan may be. Simulate some response data and perform the actual analysis from your plan on this simulated data.
Sample solution
In this sample solution, we are going to use the edibble
R package to design the experiment and simulate some response data.
- We’ll analyse each response variable separately using ANOVA.
- We can actually peak the “truth” by looking at the simulation process using
examine_process()
. The line that starts withy <-
shows how the response was generated. Every response has the so-called plant effects, but check what other effects was included. Does it match up with what is statistical significant from your ANOVA
- While you have to becareful to build models with proper context and perform model diagnostic, you can check the suggested baseline model using
design_model()
.