Blocking (statistics)

In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups (blocks) that are similar to one another.

Example

For example, an experiment is designed to test a new drug on patients. There are two levels of the treatment, drug, and placebo, administered to male and female patients in a double blind trial. The sex of the patient is a blocking factor accounting for treatment variability between males and females. This reduces sources of variability and thus leads to greater precision.

Use

Reducing known variability is exactly what blocking does. Its principle lies in the fact that a variability that cannot be overcome (e.g. needing two batches of raw material to produce 1 container of a chemical) is confounded or aliased with a(n) (higher/highest order) interaction to eliminate its influence on the end product. High order interactions are usually of the least importance (think of the fact that temperature of a reactor or the batch of raw materials is more important than the combination of the two - this is especially true when more (3, 4, ...) factors are present) thus it is preferable to confound this variability with the higher interaction.

Suppose a process is invented that intends to make the soles of shoes last longer, and a plan is formed to conduct a field trial. Given a group of n volunteers, one possible design would be to give n/2 of them shoes with the new soles and n/2 of them shoes with the ordinary soles, randomizing the assignment of the two kinds of soles. This type of experiment is a completely randomized design. Both groups are then asked to use their shoes for a period of time, and then measure the degree of wear of the soles. This is a workable experimental design, but purely from the point of view of statistical accuracy (ignoring any other factors), a better design would be to give each person one regular sole and one new sole, randomly assigning the two types to the left and right shoe of each volunteer. Such a design is called a randomized complete block design. This design will be more sensitive than the first, because each person is acting as their own control and thus the control group is more closely matched to the treatment group.

Theoretical basis

The theoretical basis of blocking is the following mathematical result. Given random variables, X and Y

[itex]

\operatorname{Var}(X-Y)= \operatorname{Var}(X) + \operatorname{Var}(Y) - 2\operatorname{Cov}(X,Y). [/itex]

The difference between the treatment and the control can thus be given minimum variance (i.e. maximum precision) by maximising the covariance (or the correlation) between X and Y.

References

• Addelman, Sidney (Oct 1969). "The Generalized Randomized Block Design". The American Statistician 23 (4): 35–36. JSTOR 2681737. doi:10.2307/2681737.
• Addelman, Sidney (Sep 1970). "Variability of Treatments and Experimental Units in the Design and Analysis of Experiments". Journal of the American Statistical Association 65 (331): 1095–1108. JSTOR 2284277. doi:10.2307/2284277.
• Pre-publication chapters are available on-line.
• Caliński, Tadeusz and Kageyama, Sanpei (2000). Block designs: A Randomization approach, Volume I: Analysis. Lecture Notes in Statistics 150. New York: Springer-Verlag. ISBN 0-387-98578-6.
• Gates, Charles E. (Nov 1995). "What Really Is Experimental Error in Block Designs?". The American Statistician 49 (4): 362–363. JSTOR 2684574. doi:10.2307/2684574.
• Lentner, Marvin; Thomas Bishop (1993). Experimental design and analysis (Second ed.). P.O. Box 884, Blacksburg, VA 24063: Valley Book Company. pp. 225–226. ISBN 0-9616255-2-X.
• Wilk, M. B. (June 1955). "The Randomization Analysis of a Generalized Randomized Block Design". Biometrika 42 (1–2): 70–79. JSTOR 2333423.
• Zyskind, George (Dec 1963). "Some Consequences of randomization in a Generalization of the Balanced Incomplete Block Design". The Annals of Mathematical Statistics 34 (4): 1569–1581. JSTOR 2238364. doi:10.1214/aoms/1177703889.