Understanding and misunderstanding randomized controlled trials
Read full paper →- Authors
- Angus Deaton, Nancy Cartwright
- Journal
- Social Science & Medicine
- Year
- 2017
- Citations
- 1,799
TL;DR
This paper argues that while Randomized Controlled Trials (RCTs) are valuable, they are often over-trusted and misunderstood, especially regarding their ability to equalize groups, provide precise estimates, or guarantee generalizability, meaning self-experimenters should be highly critical of RCT findings and thoughtfully design their own experiments to understand *why* something works, not just *if* it works for a specific group.
What they tested
This paper did not test an intervention in the traditional sense. Instead, the authors, Angus Deaton and Nancy Cartwright, critically examined the theoretical foundations, common claims, and practical limitations of Randomized Controlled Trials (RCTs) as a research methodology. They analyzed the statistical and philosophical assumptions underlying RCTs, particularly as they are increasingly applied in the social sciences beyond their traditional medical context. Their "test" involved a rigorous conceptual and logical analysis of how RCTs are designed, interpreted, and often misinterpreted by researchers and the public. They scrutinized claims about randomization's power, the precision of average treatment effects, the role of covariates, and the generalizability (external validity) of RCT findings.
Who was studied
No human participants, animals, or other subjects were studied in this paper. This is a theoretical and methodological critique, not an empirical study. The "subjects" of their analysis were the principles, practices, and interpretations of Randomized Controlled Trials themselves.
How they measured it
This paper did not involve empirical measurements, data collection, or statistical analysis of experimental results. Instead, the authors employed a method of logical argumentation, conceptual analysis, and critical review of statistical and philosophical literature pertaining to experimental design and causality. They "measured" the validity of common claims about RCTs by dissecting their underlying assumptions and logical consequences. Their approach involved:
**Conceptual Analysis:** Deconstructing key terms and concepts like "randomization," "unbiased estimate," "average treatment effect (ATE)," and "external validity."
**Logical Argumentation:** Presenting arguments for why certain claims about RCTs (e.g., that randomization equalizes everything) are not strictly true or are often misapplied.
**Statistical Principles Review:** Referring to established statistical theory to explain the properties and limitations of randomization and estimation.
**Philosophical Inquiry:** Discussing the broader implications of RCTs for scientific progress, cumulative knowledge, and understanding causality ("why things work" versus "what works").
Essentially, their "instruments" were rigorous reasoning and a deep understanding of statistics and philosophy of science, applied to the methodology of RCTs.
Methodology
This paper is a **conceptual and methodological critique**, not an empirical study like an RCT, observational study, or meta-analysis. It does not involve randomisation, blinding, washout periods, or statistical analysis of primary data. Instead, Deaton and Cartwright engaged in a detailed theoretical examination of the Randomized Controlled Trial (RCT) methodology.
**How they "ran" the study:**
The authors conducted a systematic analysis of the theoretical underpinnings and practical implications of RCTs. Their approach involved:
1. **Identifying common claims:** They started by identifying prevalent beliefs and assertions about RCTs, particularly their strengths and what they are purported to achieve (e.g., "randomization equalizes everything," "RCTs automatically provide precise estimates," "they don't require thinking about covariates").
2. **Deconstructing these claims:** They then systematically broke down each claim, examining its statistical and logical validity. For instance, they explained *why* randomization, while ensuring unbiasedness *on average* over infinite hypothetical trials, does not guarantee balance in any single trial.
3. **Analyzing limitations and assumptions:** They delved into the inherent limitations of RCTs, such as the fact that estimates apply only to the specific sample studied, the challenges of external validity (generalizing results), and the distinction between statistical unbiasedness and practical utility.
4. **Proposing a balanced perspective:** Rather than dismissing RCTs, they aimed to clarify their appropriate role within a broader scientific endeavor, arguing for their integration with other methods to build cumulative knowledge and understand underlying mechanisms.
**Why this design matters:**
A conceptual critique like this is crucial because it addresses the foundational understanding of a widely used research method.
**It clarifies misconceptions:** By dissecting common misunderstandings, it helps researchers and the public interpret study results more accurately. For someone running a self-experiment, understanding these nuances is vital to avoid misinterpreting published RCTs or overestimating the certainty of their own n=1 findings.
**It improves experimental design:** By highlighting the limitations of randomization and the importance of covariates, it encourages more thoughtful and robust experimental design, even for personal experiments.
**It promotes scientific progress:** The authors argue that an over-reliance on RCTs for "what works" without understanding "why it works" hinders cumulative scientific knowledge. This critique encourages a more holistic approach to research, integrating theory and mechanism alongside empirical testing.
**What this design can and cannot prove:**
**Can prove:** This type of paper can logically demonstrate inconsistencies, expose flawed assumptions, and clarify the precise meaning and scope of statistical and methodological concepts. It can prove that certain common claims about RCTs are logically or statistically unsound. It can establish a robust framework for *how to think about* RCTs.
**Cannot prove:** It cannot provide new empirical data, test the efficacy of an intervention, or generate statistical effect sizes. It does not offer direct evidence for or against a specific treatment. Its conclusions are based on logical reasoning and existing statistical theory, not new observations.
**Major methodological weaknesses (of the paper itself):**
As a theoretical paper, it doesn't have "methodological weaknesses" in the same way an empirical study would (e.g., sample size, blinding issues). However, potential points of discussion for a critical reader might include:
**Scope:** The critique focuses on specific aspects of RCTs and their application, primarily in social sciences. It doesn't offer a comprehensive alternative methodology but rather a re-evaluation of one.
**Interpretation:** While the authors present strong logical arguments, the degree to which "misunderstanding" is widespread or impactful might be open to debate among different researchers.
**Audience:** The paper is aimed at researchers and policymakers. While valuable, its theoretical depth might require a certain level of statistical and philosophical literacy to fully grasp, potentially limiting its immediate impact on a broader lay audience without further translation.
Key findings
The authors presented several critical arguments challenging common perceptions and applications of Randomized Controlled Trials (RCTs):
**Randomization does not equalize everything:** Contrary to a frequent claim, randomization does not guarantee that treatment and control groups will be identical in every respect (other than the treatment itself) in any *single* trial. While randomization ensures that, *on average* over an infinite number of hypothetical trials, all observed and unobserved covariates will be balanced between groups, this is not true for a finite, real-world trial. This means that differences between groups in a specific experiment could still be due to chance imbalances in important characteristics, not just the intervention.
**RCTs do not automatically deliver precise estimates of the Average Treatment Effect (ATE):** The precision of an ATE estimate depends on factors like sample size and the variability of the outcome, not just randomization. A small RCT might yield an unbiased estimate, but one with a very wide confidence interval, making it practically uninformative.
**RCTs do not relieve the need to think about covariates:** Even with randomization, understanding and accounting for observed and unobserved characteristics (covariates) of participants is crucial. Covariates can influence the outcome, affect the precision of estimates, and are essential for understanding *why* an intervention works or for whom it might be most effective. Ignoring them limits the scientific insight gained.
**Determining if an estimate was generated by chance is difficult:** The authors argue that the process of inferring whether an observed effect is genuinely due to the treatment or merely random chance is more complex than often assumed, especially given the limitations of single trials and the potential for chance imbalances.
**Unbiasedness is of limited practical value:** While an RCT can yield an unbiased estimate (meaning it doesn't systematically over- or underestimate the true effect *on average*), this property alone is not sufficient for practical utility. An unbiased estimate can still be highly imprecise or apply only to a very specific context, making it less useful for decision-making.
**Estimates apply only to the sample selected:** The results of an RCT are strictly applicable only to the specific group of individuals who participated in that trial. This sample is often a "convenience sample," meaning it's chosen for practical reasons rather than being perfectly representative of a larger population.
**Justification is required to extend results:** Extending the findings of an RCT to other groups, to the general population, or even to an individual within the trial (i.e., external validity) requires strong justification and additional evidence, not just the RCT itself. The authors suggest that demanding "external validity" from an RCT alone is asking too much and undervalues its specific contribution.
**RCTs are advantageous for minimal assumptions but disadvantageous for cumulative science:** RCTs require minimal prior knowledge and assumptions to establish an unbiased estimate, which is useful for persuading skeptical audiences. However, this "blank slate" approach can be a disadvantage for cumulative scientific progress, where new research should ideally build upon existing knowledge and theoretical frameworks, rather than discarding them.
**RCTs should be part of a cumulative program:** The authors conclude that RCTs are valuable tools but should not be seen as the sole or ultimate method for scientific discovery. They should be integrated into a broader, cumulative research program that combines them with other methods, including conceptual and theoretical development, to understand not just "what works," but