Common Method Bias: It's Bad, It's Complex, It's Widespread, and It's Not Easy to Fix
Read full paper →- Authors
- Philip M. Podsakoff, Nathan P. Podsakoff, Larry J. Williams, Chengquan Huang, Junhui Yang
- Journal
- Annual Review of Organizational Psychology and Organizational Behavior
- Year
- 2023
- Citations
- 1,120
TL;DR
Common method bias (CMB) — systematic error introduced when the same person rates both the predictor and outcome using the same measurement method — can inflate, deflate, or obscure true relationships between variables by up to 0.20–0.40 correlation points, and no single statistical fix reliably eliminates it, meaning anyone running a self-experiment who relies solely on self-report measures risks drawing completely wrong conclusions.
What they tested
This is a comprehensive narrative review and methodological critique — not a single experiment. The authors synthesised decades of research (primarily from organisational psychology, but applicable across behavioural science) to examine:
**The problem:** Common method bias (CMB) occurs when variance in measured variables is attributable to the measurement method (e.g., self-report survey, same rating scale, same time point) rather than the constructs being measured.
**The causes:** Multiple sources including (a) common rater effects (e.g., mood, social desirability, implicit theories), (b) item characteristic effects (e.g., ambiguous wording, leading questions), (c) item context effects (e.g., question order, priming), and (d) measurement context effects (e.g., same time, same location, same medium).
**The consequences:** CMB can inflate correlations (making false positives look real), deflate correlations (hiding true effects), or produce non-linear distortions that are nearly impossible to detect.
**The remedies:** Procedural remedies (design changes before data collection) vs. statistical remedies (post-hoc corrections). The authors reviewed evidence for each and concluded that no single remedy is sufficient.
The review did not test a specific intervention. Instead, it evaluated the effectiveness of 12+ statistical methods (e.g., Harman's single-factor test, marker variable technique, common latent factor, confirmatory factor analysis with method factors, unmeasured latent method construct, correlated uniqueness model, and more) across hundreds of simulated and real datasets.
Who was studied
No human participants were recruited. This is a theoretical and methodological review drawing on:
**Simulation studies** (e.g., Podsakoff et al., 2003; Richardson et al., 2009; Williams & McGonagle, 2016) that generated artificial datasets with known levels of CMB (e.g., 10%, 20%, 40% method variance) to test how well different statistical remedies recover true effects.
**Empirical examples** from organisational psychology, marketing, and health behaviour research (e.g., studies of job satisfaction–performance correlations, stress–burnout relationships, attitude–behaviour links).
**Meta-analyses** of CMB prevalence (e.g., Cote & Buckley, 1987, found that method variance accounted for ~25% of total variance across 70+ studies; Doty & Glick, 1998, found ~18% in 28 studies).
The review is not limited to a single population but generalises across adult samples in work, education, and consumer settings.
How they measured it
No direct measurement occurred. The authors evaluated CMB using:
**Simulated data:** Generated with known true correlations (e.g., r = 0.30) and known method variance (e.g., 20% of total variance). They then applied statistical remedies and compared recovered correlations to true values.
**Real data reanalyses:** Re-analysed published datasets (e.g., from job satisfaction surveys) where CMB was suspected, using multiple statistical methods to see if conclusions changed.
**Method variance estimates:** Reported as percentage of total variance attributable to method factors (e.g., "method variance accounted for 18–25% of total variance in typical self-report studies").
Key metrics used to evaluate remedies:
**Bias reduction:** How much the remedy reduced the gap between estimated and true correlation (e.g., from r = 0.50 inflated to r = 0.30 true, a 0.20 reduction).
**Type I error rate:** Proportion of false positives (e.g., "Harman's single-factor test has a Type I error rate of 0.40–0.60, meaning it fails to detect CMB 40–60% of the time").
**Parameter recovery:** How well the remedy estimated the true factor loadings and structural paths (e.g., "the unmeasured latent method construct approach recovered true loadings within ±0.05 when method variance was ≤20%, but errors exceeded ±0.15 when method variance was ≥40%").
Methodology
### Study design
This is a **narrative review with methodological simulation evidence** — not a meta-analysis or systematic review with pre-registered search criteria. The authors synthesised findings from multiple simulation studies, empirical reanalyses, and prior reviews (primarily Podsakoff et al., 2003, and subsequent work).
### Key design features of the simulation studies reviewed
**Data generation:** Simulated datasets with known true correlations (e.g., r = 0.00, 0.20, 0.50) and known method variance (e.g., 0%, 10%, 20%, 40% of total variance). Sample sizes ranged from N = 100 to N = 1,000.
**Manipulated factors:** Number of method factors (1 vs. 2), correlation between method factors (r = 0.00 vs. 0.30), type of method effect (additive vs. multiplicative), and whether method effects were equal across items or varied.
**Statistical remedies tested:** Harman's single-factor test, marker variable technique, common latent factor, confirmatory factor analysis (CFA) with method factors, unmeasured latent method construct (ULMC), correlated uniqueness model, and latent variable interaction models.
**Evaluation criteria:** Bias in parameter estimates, Type I error rates, statistical power, and convergence failures.
### What this design can and cannot prove
**Can prove:**
That CBM is widespread (meta-analytic evidence shows 18–25% of variance in self-report studies is method variance).
That no single statistical remedy reliably corrects CMB across all conditions (simulation evidence shows all remedies fail under at least some realistic conditions).
That procedural remedies (e.g., temporal separation, different raters) are more effective than statistical remedies (because they prevent CMB from occurring rather than trying to model it post-hoc).
**Cannot prove:**
The exact magnitude of CMB in any specific study (because method variance is confounded with true construct variance in real data).
That any statistical remedy can "fix" CMB in a given dataset (all remedies rely on untestable assumptions about the nature of method effects).
Causal relationships between CMB sources and outcomes (the simulations are artificial and may not capture real-world complexity).
### Major methodological weaknesses
**No systematic search protocol:** The authors did not pre-register search terms, inclusion criteria, or quality assessment. This introduces selection bias — they may have cherry-picked studies that support their conclusions.
**Reliance on simulated data:** Simulation studies assume a specific model of how method effects operate (e.g., additive, equal across items). Real-world CMB may be more complex (e.g., multiplicative, non-linear, interacting with true scores).
**Limited generalisability:** Most simulation studies used Likert-scale items (5–7 points) and assumed normally distributed errors. CMB may behave differently with binary items, open-ended responses, or behavioural measures.
**No quantitative synthesis:** The authors did not meta-analyse effect sizes across studies. Instead, they reported ranges (e.g., "bias ranged from 0.05 to 0.40") without formal pooling.
**Publication bias:** Studies that found CMB to be a major problem may be more likely to be published than those that found it negligible.
Key findings
### Prevalence of CMB
**Meta-analytic evidence:** Cote & Buckley (1987) found that method variance accounted for 25.3% of total variance across 70+ studies. Doty & Glick (1998) found 18.2% across 28 studies. More recent work (Williams & McGonagle, 2016) suggests 15–30% is typical.
**Conditions that increase CMB:** Same rater for predictor and outcome (inflation of r by 0.10–0.30), same measurement occasion (inflation by 0.05–0.15), same item format (inflation by 0.05–0.10), and ambiguous items (inflation by 0.10–0.20).
### Effectiveness of statistical remedies
**Harman's single-factor test:** Type I error rate = 0.40–0.60 (fails to detect CMB 40–60% of the time). Type II error rate = 0.20–0.35 (falsely indicates CMB when none exists 20–35% of the time). **Not recommended.**
**Marker variable technique:** Reduces bias by 0.05–0.15 when a valid marker (a variable theoretically unrelated to all study variables) is available. But finding a valid marker is extremely difficult — most markers share some true variance with study variables, leading to overcorrection (bias in the opposite direction) of 0.05–0.20.
**Common latent factor (CLF):** Reduces bias by 0.10–0.25 when method effects are equal across all items. But when method effects vary (which is typical), CLF can introduce new bias of 0.10–0.30.
**Unmeasured latent method construct (ULMC):** Reduces bias by 0.15–0.30 when method effects are additive and equal. But when method effects are multiplicative or correlated with true scores, ULMC can inflate bias by 0.20–0.40.
**Correlated uniqueness model:** Reduces bias by 0.05–0.15 but requires many correlated error terms (which can lead to model convergence failures in 10–30% of cases).
**Latent variable interaction models:** Can detect CMB in some cases but require very large samples (N > 500) and produce unstable estimates when method variance exceeds 20%.
### Effectiveness of procedural remedies
**Temporal separation:** Separating predictor and outcome measurement by 1–4 weeks reduces CMB by 0.10–0.25 (based on meta-analytic comparisons of cross-sectional vs. longitudinal designs).
**Different raters:** Using different people to rate predictor and outcome (e.g., self-report for attitudes, supervisor report for performance) reduces CMB by 0.15–0.35.
**Methodological separation:** Using different response formats (e.g., Likert scale for predictor, open-ended for outcome) reduces CMB by 0.05–0.15.
**Anonymity assurance:** Promising anonymity reduces social desirability bias by 0.05–0.10 (but does not eliminate it).
**Item counterbalancing:** Randomising item order reduces context effects by 0.05–0.10.
### Key nuance: CMB can deflate as well as inflate
**Inflation:** When the same rater provides both predictor and outcome, and both are measured with the same method, correlations can be inflated by 0.10–0.40 (e.g., a true r = 0.20 appears as r = 0.50).
**Deflation:** When method effects are negatively correlated with true scores (e.g., social desirability inflates both variables but in opposite directions), correlations can be deflated by 0.10–0.30 (e.g., a true r = 0.50 appears as r = 0.20).
**Non-linear distortion:** When method effects interact with true scores (e.g., high scorers are more affected by social desirability than low scorers), the correlation can be distorted in complex ways that no statistical remedy can recover.
Effect magnitude
**Typical CMB magnitude:** In a typical self-report study where the same person rates both predictor and outcome at the same time using the same scale format, method variance accounts for 18–25% of total variance. This means that if you observe a correlation of r = 0.40 between two self-report variables, the true correlation could be anywhere from r = 0.00 to r = 0.60, depending on the direction and magnitude of CMB.
**Practical example:** Suppose you run a self-experiment testing whether daily meditation (self-reported minutes) improves mood (self-reported on a 1–10 scale). If you measure both at the same time each day (e.g., evening survey), CMB could inflate the observed correlation by 0.15–0.30. So a "significant" r = 0.40 might actually reflect a true r = 0.10–0.25 — or even r = 0.00 if CMB is deflationary.
**Worst-case scenario:** In studies with high social desirability (e.g., reporting exercise habits and health), CMB can inflate correlations by up to 0.40. In studies with ambiguous items (e.g., "I feel stressed" and "I feel overwhelmed"), CMB can inflate by 0.30–0.50.
**Best-case scenario:** With temporal separation (1–2 weeks) and different response formats, CMB is reduced to 5–10% of total variance, meaning observed correlations are within 0.05–0.10 of true values.
Limitations
### What the authors acknowledge
**No single remedy is sufficient:** The authors explicitly state that "no statistical remedy can fully control for CMB" and that "procedural remedies are generally more effective than statistical ones."
**Assumptions of statistical remedies:** All statistical remedies assume that method effects are additive, equal across items, and uncorrelated with true scores — assumptions that are rarely met in practice.
**Difficulty of identifying valid markers:** The marker variable technique requires a variable that is theoretically unrelated to all study variables, which is "extremely difficult to identify" in most research contexts.
**Publication bias:** The authors note that studies finding CMB to be a problem may be overrepresented in the literature.
### What a critical reader would note
**No systematic search:** The review did not follow PRISMA guidelines or pre-register a search protocol. This means the evidence base may be selectively chosen.
**Overreliance on simulation:** Simulation studies are useful for testing statistical properties, but they may not capture real-world complexity (e.g., non-linear method effects, interactions between multiple method sources).
**Limited to organisational psychology:** Most studies reviewed come from organisational behaviour and marketing. CMB may behave differently in other domains (e.g., clinical psychology, neuroscience, where objective measures are more common).
**No quantitative synthesis:** The authors report ranges and examples but do not meta-analyse effect sizes. This makes it hard to know the average magnitude of CMB across conditions.
**Outdated references:** Some key claims rely on studies from the 1980s and 1990s (e.g., Cote & Buckley, 1987; Doty & Glick, 1998). More recent meta-analyses might yield different estimates.
**No discussion of preregistration:** The authors do not discuss how preregistration of analysis plans could reduce CMB (e.g., by preventing p-hacking that exploits method variance).
**No guidance for n=1 experiments:** The review focuses on group-level studies (N > 100). CMB in single-subject designs (e.g., daily diaries, experience sampling) may have different properties (e.g., autocorrelation, person-specific method effects).
Practical takeaways
For someone running their own n=1 experiment:
### What to test
**The intervention:** Any self-report-based intervention where you measure both predictor and outcome via the same method (e.g., daily mood logs, productivity ratings, stress scales). For example: "Does 10 minutes of morning meditation (self-reported duration) improve evening mood (self-reported on 1–10 scale)?"
**The dose:** Test one specific dose (e.g., 10 minutes) for at least 14–21 days to get stable estimates.
### Minimum meaningful duration
**At least