Is working memory training effective? A meta-analytic review.
Read full paper →- Authors
- Melby-Lervåg M, Hulme C
- Journal
- Dev Psychol
- Year
- 2013
- Citations
- 1,765
TL;DR
Working memory training programs produce reliable short-term improvements on the specific tasks you practice, but these gains do not transfer to other cognitive skills like IQ, reading, math, or attention control — meaning the evidence does not support using these programs to boost general intelligence or academic performance.
What they tested
This meta-analysis examined whether computerised working memory training programs improve:
**The intervention:** Various computerised working memory training programs (e.g., Cogmed, Jungle Memory, and other adaptive span tasks) where participants repeatedly practice holding and manipulating information in memory. Training typically involved 20–25 sessions of 30–45 minutes each, over 4–8 weeks. Programs were "adaptive" — difficulty increased as performance improved.
**Comparators:** Studies used either an untreated control group (no training), a treated control group (alternative training like reading comprehension or math games), or a placebo control group (non-adaptive versions of the same training where difficulty did not increase).
**Primary outcomes (near-transfer):** Improvements on working memory tasks similar to those trained — specifically verbal working memory (e.g., digit span backwards, listening span) and visuospatial working memory (e.g., Corsi blocks, visual matrix span).
**Secondary outcomes (far-transfer):** Improvements on untrained cognitive abilities — nonverbal IQ (e.g., Raven's Progressive Matrices), verbal ability (vocabulary, verbal reasoning), inhibitory control (Stroop task, go/no-go tasks), word decoding (reading accuracy), and arithmetic (math computation).
Who was studied
The meta-analysis included **23 studies** with **30 group comparisons**, totalling approximately **1,200–1,500 participants** across all studies (exact total N not reported in abstract, but individual studies ranged from ~20 to ~200 participants). Samples included:
**Children with ADHD** (clinical samples, typically aged 7–15)
**Children with other cognitive disorders** (e.g., specific language impairment, learning disabilities)
**Typically developing children** (aged 4–14)
**Healthy adults** (aged 18–40)
**Older adults** (aged 60+ in some studies)
Settings included schools, university labs, and clinical treatment centres across multiple countries (USA, UK, Sweden, Norway, Germany, Australia).
How they measured it
Studies used a variety of standardised and experimental measures:
**Verbal working memory:** Digit span backwards (Wechsler scales), listening span (Daneman & Carpenter), counting span, and operation span tasks. Scores typically reported as number of items correctly recalled (span length) or total correct trials.
**Visuospatial working memory:** Corsi blocks forward and backward, visual matrix span (Alloway's Automated Working Memory Assessment), and spatial span tasks. Scores as span length or total correct.
**Nonverbal IQ:** Raven's Progressive Matrices (standard or coloured), Wechsler Performance IQ subtests (Block Design, Matrix Reasoning). Standard scores (mean = 100, SD = 15).
**Verbal ability:** Vocabulary subtests from Wechsler scales or Peabody Picture Vocabulary Test. Standard scores.
**Inhibitory control:** Stroop task (interference score: reaction time on incongruent minus congruent trials), go/no-go tasks (commission errors), and flanker tasks. Measured in milliseconds or error rates.
**Word decoding:** Woodcock Reading Mastery Tests (Word Identification, Word Attack), Test of Word Reading Efficiency. Standard scores or raw accuracy.
**Arithmetic:** Woodcock-Johnson Calculation subtest, Wechsler Individual Achievement Test (Numerical Operations), or curriculum-based math tests. Standard scores or percent correct.
**Follow-up assessments:** Some studies retested participants 3–6 months after training ended to assess maintenance of effects.
Methodology
**Study design:** Systematic review with meta-analysis of randomised controlled trials (RCTs) and quasi-experimental designs (non-randomised group comparisons). The authors searched PsycINFO, ERIC, Medline, and Web of Science up to 2011, plus manual searches of reference lists.
**Inclusion criteria:**
1. Studies had to include a treatment group receiving working memory training AND either a treated control group (alternative training) or an untreated control group.
2. Studies had to be RCTs or quasi-experiments (non-randomised but with a comparison group).
3. Studies had to report sufficient data to calculate effect sizes (means, SDs, or t/F values).
4. Training had to target working memory specifically (not general cognitive training or video games).
**Exclusion criteria:** Single-case studies, studies without control groups, studies where training was combined with other interventions (e.g., medication + training), and studies not published in English.
**Statistical approach:**
Effect sizes calculated as Hedges' g (a standardised mean difference corrected for small sample bias).
Random-effects models used (assumes true effect varies across studies).
Heterogeneity assessed with I² statistic (percentage of variance due to true differences between studies).
Moderator analyses examined effects of: age group (children vs. adults), clinical vs. typically developing samples, type of control group (untreated vs. treated), training program (Cogmed vs. other), and duration of training.
Publication bias assessed with funnel plots and Egger's test.
**What this design can and cannot prove:**
**Can prove:** Whether working memory training produces reliable improvements on trained and untrained tasks, averaged across many studies. The meta-analytic approach increases statistical power and generalisability compared to single studies.
**Cannot prove:** Causal mechanisms (why training might or might not work). Cannot rule out that unpublished null studies exist (though they tested for publication bias). Cannot determine optimal training parameters (dose-response) because studies varied too much in protocols. Cannot assess long-term effects beyond 6 months because few studies included follow-up.
**Major methodological weaknesses identified by the authors:**
Many studies lacked active control groups (used no-contact controls), making it impossible to separate training effects from placebo effects or Hawthorne effects (improvement due to attention/expectation).
Few studies blinded participants or assessors to condition.
Many studies had small sample sizes (N < 30 per group), leading to low statistical power for detecting far-transfer effects.
Follow-up data were sparse (only ~6 studies had any follow-up beyond post-test).
Studies used different outcome measures, making direct comparisons difficult.
Some studies had high attrition rates (dropout >20%).
Key findings
**Near-transfer effects (improvements on working memory tasks):**
**Verbal working memory (immediate post-test):** Significant improvement, Hedges' g = 0.37 (95% CI: 0.24 to 0.50, p < .001). This is a small-to-moderate effect.
**Visuospatial working memory (immediate post-test):** Significant improvement, Hedges' g = 0.78 (95% CI: 0.52 to 1.04, p < .001). This is a large effect.
**Verbal working memory (follow-up, 3–6 months):** Effect was not significant, g = 0.08 (95% CI: -0.11 to 0.27, p = .41). Gains did not persist.
**Visuospatial working memory (follow-up, 3–6 months):** Limited evidence suggested possible maintenance, g = 0.47 (95% CI: 0.06 to 0.88, p = .03), but based on only 4 studies.
**Far-transfer effects (improvements on untrained cognitive abilities):**
**Nonverbal IQ:** No significant effect, g = 0.06 (95% CI: -0.08 to 0.20, p = .39). Based on 12 studies.
**Verbal ability:** No significant effect, g = 0.03 (95% CI: -0.14 to 0.20, p = .73). Based on 8 studies.
**Inhibitory control (Stroop, go/no-go):** No significant effect, g = 0.05 (95% CI: -0.10 to 0.20, p = .51). Based on 9 studies.
**Word decoding (reading):** No significant effect, g = 0.06 (95% CI: -0.11 to 0.23, p = .49). Based on 7 studies.
**Arithmetic:** No significant effect, g = 0.04 (95% CI: -0.14 to 0.22, p = .66). Based on 6 studies.
**Moderator analyses:**
No significant differences between children and adults for any outcome.
No significant differences between clinical and typically developing samples.
No significant differences between Cogmed and other training programs.
No significant effect of training duration (number of sessions) on outcomes.
Studies with active control groups showed smaller effects than those with untreated controls (suggesting some placebo effects).
**Publication bias:** Funnel plots showed some asymmetry for near-transfer outcomes, suggesting possible publication bias (missing null studies). Egger's test was significant for visuospatial working memory (p = .04), meaning the true effect may be smaller than reported.
Effect magnitude
**Verbal working memory improvement:** A g of 0.37 means the average trained person scored about one-third of a standard deviation higher than the average untrained person. In practical terms, if the average person could remember 5 digits backwards before training, after training they might remember 5.5–6 digits — a modest gain equivalent to about 1 extra item.
**Visuospatial working memory improvement:** A g of 0.78 means the average trained person scored about three-quarters of a standard deviation higher. This is a larger effect — equivalent to going from remembering 4 Corsi block sequences to remembering 5–6 sequences. However, this effect was based on fewer studies and may be inflated by publication bias.
**Far-transfer (IQ, reading, math):** Effects were essentially zero (g = 0.03–0.06). To put this in perspective: if working memory training truly improved IQ, you would expect an effect of at least g = 0.20–0.30 (the typical effect of a semester of education). The observed effects are indistinguishable from zero — meaning training does not improve general cognitive ability.
**Comparison to other interventions:** The near-transfer effects are similar in size to the effect of practicing a specific memory task for a few hours (practice effects). The far-transfer effects are smaller than the effect of a single dose of stimulant medication on attention in ADHD (g ≈ 0.5–0.8) and much smaller than the effect of tutoring on reading skills (g ≈ 0.5–1.0).
Limitations
**Acknowledged by authors:**
1. **Variability in samples:** Studies included children with ADHD, learning disabilities, typical development, and healthy adults — these groups may respond differently to training, but there were too few studies to analyse subgroups separately.
2. **Variety of training programs:** Different programs (Cogmed, Jungle Memory, custom tasks) may have different efficacy, but moderator analyses found no differences.
3. **Limited follow-up data:** Only ~25% of studies included any follow-up assessment, making it impossible to draw firm conclusions about long-term effects.
4. **Publication bias:** Evidence of asymmetry in funnel plots suggests some null studies may be missing from the literature.
5. **Quality of included studies:** Many studies had methodological weaknesses (no randomisation, no blinding, small samples, high attrition).
**Critical reader observations:**
6. **No active control in many studies:** Over half the studies used untreated control groups. This means the near-transfer effects could partly reflect placebo effects (expectation of improvement), Hawthorne effects (attention from researchers), or simply test-retest practice effects.
7. **Near-transfer tasks are often very similar to training tasks:** For example, training on a visuospatial span task and then testing on a different visuospatial span task. This is essentially measuring whether you get better at the type of thing you practiced — not surprising and not evidence of general cognitive enhancement.
8. **Industry funding:** Several studies were funded by companies that sell working memory training programs (e.g., Cogmed was developed by a company later acquired by Pearson). Industry-funded studies tend to show larger effects.
9. **No dose-response relationship:** If training truly improved working memory capacity, you would expect more training to produce larger gains. The fact that duration did not predict effect size undermines the causal claim.
10. **Age range too broad:** Combining 4-year-olds and 40-year-olds may obscure important developmental differences in trainability.
Practical takeaways
For someone running their own n=1 experiment:
### What to test (specific intervention and dose)
**Intervention:** A computerised adaptive working memory training program (e.g., Cogmed, or a free alternative like the n-back task from Brain Workshop or the dual n-back from Cambridge Brain Sciences). The key feature is that difficulty adjusts to your performance — you always work at ~80% accuracy.
**Dose:** The typical protocol is 20–25 sessions, each lasting 30–45 minutes, completed 4–5 times per week. Total training time: ~15–20 hours over 4–6 weeks.
**Alternative dose:** A shorter protocol of 10 sessions (2 weeks) may be sufficient to detect near-transfer effects, but far-transfer effects are unlikely at any dose.
### Minimum meaningful duration
**For near-transfer (improvement on memory tasks):** 2 weeks (10 sessions) is the minimum to see reliable improvement on trained tasks.
**For far-transfer (IQ, reading, math):** Based on this meta-analysis, no duration has been shown to produce far-transfer. If you want to test it, a minimum of 4 weeks (20 sessions) is reasonable, but be prepared for null results.
**Follow-up:** If you see improvements, retest 1 month and 3 months after stopping to see if gains persist.
### What to measure (specific metrics)
**Primary outcome (near-transfer):** A working memory task you did NOT train on. For verbal training, test visuospatial working memory (e.g., Corsi blocks online). For visuospatial training, test verbal working memory (e.g., digit span backwards). Use a free online test (e.g., from Cambridge Brain Sciences, PEBL, or Psytoolkit).
**Secondary outcomes (far-transfer):**
- Fluid intelligence: Raven's Progressive Matrices (free online version available)
- Processing speed: Simple reaction time or digit-symbol coding
- Attention: Stroop task or flanker task (free online)
- Academic skills: If relevant, a brief reading comprehension or math fluency test
**Control measure:** A task that should NOT change (e.g., vocabulary knowledge) to rule out general practice effects or motivation changes.
### Key confounds to control for
**Practice effects:** Repeated testing on the same measure will improve scores regardless of training. Use parallel forms of tests (different versions) or test only at pre and post (not weekly).
**Placebo effect:** If you believe training will help, you may try harder on post-tests. Use a placebo control condition (e.g., 4 weeks of a non-adaptive memory game or a cognitive training program that targets a different skill, like visual search).
**Motivation and fatigue:** Training is boring. Track your motivation daily (1–10 scale) and note if you skip sessions. Low motivation may reduce compliance and effects.
**Sleep and stress:** Both affect working memory. Track sleep quality (hours, subjective quality) and stress (daily 1–10 scale). If these change during the experiment, they could confound results.
**Caffeine and stimulants:** These improve working memory acutely. Keep caffeine intake consistent across pre- and post-testing sessions.
**Time of day:** Working memory peaks at different times for different people. Test at the same time of day (±1 hour) for all assessments.
### What a positive result would look like
**Near-transfer:** A 10–20% improvement on an untrained working memory task (e.g., digit span backwards increases from 5 to 6 items). This is a realistic and commonly observed effect.
**Far-transfer:** A 5–10% improvement on a fluid intelligence test (e