Principles of experimental designs

ANU BDSI
workshop
Design and Analysis of Experiments

Emi Tanaka

Biological Data Science Institute

20th September 2024

Current learning objective

  • -Comprehend the differences between experimental and observational data
  • -Demonstrate proficiency in designing experiments, including defining research questions, selecting appropriate treatments or factors, and identifying potential sources of variation
  • Understand the principles of experimental design, including randomization, control, replication, and blocking
  • -Understand the fundamental concepts of causal inference for experimental data
  • -Formulate a statistical analysis plan for the given experimental design

Designing experiments

  • Designing a comparative experiment in the biological sciences is to identify a data-collection scheme that:
    • achieve sensitivity and specificity requirements
    • despite biological and technical variability,
    • while keeping time and resource costs low.

Comparative experiments

Aim: test if a new supplement increases milk yield from Holstein Friesian cows.

  • Is the new supplement effective?
  • Most experiments are comparative in nature.
  • Historical data suggests that Holstein Friesian cows have an average milk yield of
  • Is the new supplement better?
  • Are historical data comparable to new experimental data?

Controls

  • Historical controls are results of similar studies from historical or past records.
  • Problem with comparing the treatment group and historical control group is that the groups may differ in important ways besides the treatment.
  • The control group should be in the same experiment as treatment group where ideally the difference between the two groups is only the treatment.
  • A control does not mean necessarily “do nothing” treatment, but can be the current standard practice or a placebo.

Blinded experiment

  • A placebo is a treatment designed to have no therapeutic value, but to ensure that the subjects are blind to which treatment they received.
  • In some experiments, it is important to to ensure the researchers and/or technicians are also blind to the treatment (referred to as double-blind studies).
  • In a blinded experiment, certain information are withheld to reduce biases.
  • Blinding is more common in experiments that involve humans (e.g. clinical trials).

Unreplicated experiments

Aim: test if a new supplement increases milk yield from Holstein Friesian cows compared to the control supplement .


Statistical anatomy of the experiment:

  • Experimental units: 2 cows
  • Observational units: 2 cows
  • Response: milk yield
  • Treatments: new vs control supplements
  • Allotment: supplements cows
  • Conclusion: produces more than therefore is an effective supplement for a higher milk yield for Holstein Friesian cows.

  • How confident will you be of this conclusion?

Natural variation of units

Aim: test if a new supplement increases milk yield from Holstein Friesian cows compared to the control supplement .


  • Ensure uniform material are used for experimental units as much as you can, but
  • no individual experimental units are the same (with some exceptions).
  • There will be a natural variation of the experimental units.

Treatment replications


  • Treatment replications increases precision and quantify uncertainty.
  • Ideally we want higher replications but resources limit this.

Confounded factors






  • Units: 2 pens with 3 cows each
  • Are the treatment means comparable?
  • In this case, the pen is completely confounded (or aliased) with the supplement.
  • We do not get any valid inference about the treatment effects!
  • How would you distribute the treatments?

Complete block designs









  • Every treatment appears once in each pen (referred to as complete block designs)
  • Each treatment appears in every pen so you can be more confident that the treatment means are not due to the conditions of particular pens
  • Comparing like-with-like increases precision.
  • Cows in the same pen share a more similar environment than cows in another pen.
  • Different treatments to alike experimental units gives more precision in treatment comparison.

Pseudo-replication

Aim: To compare the effectiveness of three supplements on milk yield from cows.








  • Allotment: supplements pens

  • Experimental units: pens (not cows!), Observational units: cows

  • There is an average of 1.5 replication (not three!)

  • We refer analysis that treat repetition as replication as pseudo-replication.

Replication, Repetition, and Duplication

  • In an experimental context, treatment
    • replication refers to the (average) number of independent allocation of each treatment to experimental units,
    • repetition refers to observational units allocated with the same treatment, and
    • duplication refers to repeated measurement of the same unit.
  • Replication increases precision of estimated treatment effects.
  • Repetition helps to measure the variation of the observational units.
  • Duplication helps to measure the technical variation of the measuing instrument.

Systematic designs

  • The supplement treatment is given in a systematic order.
  • What could go wrong with this?
  • The order of the experimental units may be confounded with some extraneous factor
  • Like say, the order of the experimental units was determined by the speed (fast to slow) of the cow to get to the feed
  • This means that the more active cows are given and least active ones are given

Randomisation

  • Randomisation protects you against bias and potential unwanted confounding with extraneous factors

  • Bias comes in many forms: obvious to not-so obvious, known to unknown, and so on.

  • Randomisation doesn’t mean it’ll completely shield you from all biases.

  • Randomisation is like buying an insurance (but free!).

  • You can get a systematic order by chance! This doesn’t mean you should keep on randomising your design until get the layout you want! You should instead consider blocking your units before randomisation.

  • Block what you can, randomise what you cannot.

How to randomise?

  • Preferably use a computer to randomise treatments to units.
  • We’ll use the edibble R package to demonstrate this.

Factorial treatment structure

Aim: study the effect of fertlizer type A and type B and irrigation on wheat yield






Statistical anatomy:

  • Units:
    • Experimental units: 12 plots
    • Observational units: 12 plots
  • Observation: wheat yield
  • Treatments: combination of:
    • Water: irrigated or rain-fed
    • Fertilizer: type A or type B
  • Allotment:
    • Water plots
    • Fertilizer plots

How many treatment replications do we have?

Treatment Replication
4
2
2
4


Treatment factor Count
6
6
6
6

Factorial treatment structure with different allotment






  • Allotment:
    • Water and fertilizer plots
Treatment Replication
3
3
3
3


Treatment factor Count
6
6
6
6

Split-plot design







  • Units: 6 strips with 2 plots each
  • Allotment:
    • Water strip
    • Fertilizer plot
  • This design is a factorial design but there is a nested unit structure with constaint in treatment allocation
Treatment Replication
3
3
3
3


Treatment factor Count
6
6
6
6

Allocating treatments

Design anatomy

  • Design anatomy shows the breakdown of degrees of freedom across different sources of variation (related to skeleton ANOVA)

Invalid design

  • The example below has no degrees of freedom for the residual source of variation

Summary

  • Remember the basic design principles: controls, replication, blocking, and randomisaton.
  • Use blinding where applicable to reduce experimental bias.
  • Randomisation is like buying an insurance (but free!).
  • Randomisation helps to protect you from unknown confounding factors.
  • Block what you can, randomise what you cannot.
  • Watch out for pseudoreplication!
  • Randomise using a computer program if possible.
  • Produce a design anatomy to see the spread of the degrees of freedom across sources of variation.