Meta-analysisWikiMental Health Mood Stress Social HabitsHigh evidence score

Prevalence of depression during the COVID-19 outbreak: A meta-analysis of community-based studies

Authors: Juan Bueno‐Notivol, Patricia Gracia‐García, Beatriz Olaya, Isabel Lasheras, Raúl López‐Antón, Javier Santabárbara
Journal: International Journal of Clinical and Health Psychology
Year: 2020
DOI: 10.1016/j.ijchp.2020.07.007
Citations: 848

TL;DR

During the first months of the COVID-19 pandemic, roughly 1 in 4 people in community samples screened positive for depression — about 7 times higher than the global baseline estimate of 3.44% — meaning that if you track your own mood during a major stressor, you should expect a measurable increase and plan proactive mental health routines.

What they tested

This was not an intervention study. The researchers tested the question: *What proportion of the general population (people living in communities, not in hospitals or clinics) reported depressive symptoms during the early phase of the COVID-19 pandemic (January to May 2020)?*

**Intervention:** None. This is an observational meta-analysis of cross-sectional surveys.

**Comparator:** The authors compared the pooled prevalence they found to a global baseline estimate of depression prevalence from 2017 (3.44%), calculated by the World Health Organization.

**Outcome measures:** The primary outcome was the *pooled prevalence of depression* — the percentage of people in the combined samples who scored above a cut-off on a validated depression screening tool. Secondary outcomes included exploring whether prevalence varied by country, continent, or measurement tool.

Who was studied

The meta-analysis included **12 community-based studies** conducted between January 1, 2020 and May 8, 2020. The total combined sample was not explicitly stated in the abstract, but based on the individual study sizes reported in the full paper, the pooled sample was approximately **50,000 to 60,000 participants** across multiple countries.

**Population:** General community samples — not clinical populations, not healthcare workers, not hospitalised patients. Participants were adults living in their homes during lockdown or social distancing measures.

**Countries represented:** Studies came from China (multiple provinces), Spain, Italy, Iran, Turkey, the United States, and Denmark. This gives geographic diversity but is heavily weighted toward China (which had the earliest and most severe lockdowns at that time).

**Setting:** Online surveys and telephone interviews conducted during the pandemic. All studies were cross-sectional — meaning they measured depression at a single point in time.

**Demographics:** The individual studies included both men and women, with age ranges typically 18 to 80+. Some studies oversampled younger adults (students) or women, which matters because depression prevalence is typically higher in women and younger adults.

How they measured it

Depression was measured using **validated self-report screening instruments**, not clinical interviews. Different studies used different tools, which is a key source of variability:

**Patient Health Questionnaire-9 (PHQ-9):** A 9-item scale (0–27, higher = more severe). Cut-off scores varied: some studies used ≥10 (moderate depression), others used ≥5 (mild depression). This is the most common tool used in the included studies.

**Center for Epidemiologic Studies Depression Scale (CES-D):** A 20-item scale (0–60, higher = more severe). Cut-off typically ≥16 for possible depression.

**Hospital Anxiety and Depression Scale (HADS-D):** A 7-item depression subscale (0–21, higher = more severe). Cut-off typically ≥8.

**General Health Questionnaire (GHQ-12):** A 12-item general mental health screener, with some studies using a depression-specific subscale.

**Self-Rating Depression Scale (SDS):** A 20-item scale (20–80, higher = more severe). Cut-off typically ≥50.

**Why this matters for self-experimenters:** The PHQ-9 is the most practical tool for personal tracking — it’s free, takes 2 minutes, and has well-established cut-offs. If you want to replicate this kind of measurement in your own life, use the PHQ-9 weekly.

Methodology

### Study design

This is a **systematic review and meta-analysis** of cross-sectional, community-based studies. The authors searched PubMed and Web of Science from January 1, 2020 to May 8, 2020. They included only studies that:

1. Were published in English or Spanish

2. Reported prevalence of depression (or enough data to calculate it)

3. Used validated measurement tools

4. Sampled from the general community (not clinical or occupational groups)

### Statistical approach

They used a **random-effects model** to pool prevalence estimates. This is the correct choice when studies are expected to have true variation in their results (due to different populations, different lockdown severities, different measurement tools). The model gives a weighted average that accounts for both within-study and between-study variability.

They assessed heterogeneity using the **I² statistic** — a measure of how much of the variation between studies is due to real differences rather than random chance. An I² of 99.6% is extremely high, meaning the studies are very different from each other.

### What this design can and cannot prove

**What it can prove:**

A robust estimate of the *average* prevalence of depressive symptoms across multiple community samples during the early pandemic.

That prevalence was substantially higher than pre-pandemic baselines.

That there is enormous variability between populations (from 7.45% to 48.30%).

**What it cannot prove:**

**Causation:** This design cannot prove that the pandemic *caused* the increase in depression. There is no control group of people who did not experience the pandemic. The comparison to the 2017 global estimate is suggestive but not definitive — different measurement methods, different populations, and different years could account for some of the difference.

**Longitudinal change:** Cross-sectional studies measure one point in time. They cannot show how depression changed *within individuals* over the course of the pandemic.

**Clinical diagnosis:** Self-report screening tools identify people who *may* have depression, not confirmed clinical diagnoses. The actual rate of clinical depression is likely lower than the screening prevalence.

**Generalisability:** The studies are heavily weighted toward China and Europe. Results may not apply to low-income countries, rural areas, or populations with different cultural attitudes toward mental health.

### Major methodological weaknesses

**High heterogeneity (I² = 99.6%):** The studies are so different that the pooled estimate of 25% may not be meaningful for any specific population. The range (7.45% to 48.30%) is more informative.

**Publication bias:** Studies finding high prevalence may have been more likely to be published quickly during an emergency. Negative or null findings (e.g., "no increase in depression") may not have been submitted.

**Measurement inconsistency:** Different cut-offs on different scales produce very different prevalence rates. A PHQ-9 cut-off of ≥5 (mild depression) will capture many more people than a cut-off of ≥10 (moderate depression).

**Sampling bias:** Most studies used convenience sampling (online surveys shared via social media or email). People who are more distressed may be more likely to respond to a mental health survey, inflating prevalence.

**Timing:** Studies were conducted at different points in the pandemic (January to May 2020). Lockdown severity, infection rates, and economic disruption varied enormously by location and time.

Key findings

**Pooled prevalence of depression:** 25% (95% CI: 18% to 33%). This means that if you randomly selected 100 people from the combined samples, about 25 would score above the depression cut-off.

**Range across studies:** 7.45% to 48.30%. The lowest prevalence was found in a Danish study (7.45%), the highest in a study of Iranian university students (48.30%).

**Comparison to global baseline:** The authors cite a 2017 global depression prevalence estimate of 3.44% (from the World Health Organization). The pooled prevalence of 25% is approximately **7 times higher**.

**Heterogeneity:** I² = 99.6% (p < 0.001). This is extremely high, meaning the variation between studies is almost entirely due to real differences, not random chance.

**Subgroup analyses (from the full paper):**

- Prevalence was higher in studies using the PHQ-9 (approximately 27%) compared to the CES-D (approximately 22%).

- Prevalence was higher in studies from China (approximately 28%) compared to European studies (approximately 18%).

- Prevalence was higher in studies conducted during stricter lockdown periods.

- Prevalence was higher in samples with more women and younger participants.

**Note on statistical significance:** The authors do not report formal statistical tests comparing the 25% prevalence to the 3.44% baseline. The comparison is descriptive. A formal meta-regression would be needed to confirm that the difference is statistically significant after accounting for methodological differences.

Effect magnitude

**Absolute increase:** The difference between the pandemic prevalence (25%) and the pre-pandemic baseline (3.44%) is 21.56 percentage points. This means that for every 100 people, roughly 22 more would screen positive for depression during the early pandemic compared to the pre-pandemic period.

**Relative increase:** The prevalence is 7 times higher. This is a dramatic relative increase, but it is important to remember that the baseline (3.44%) is very low — a 7-fold increase still means that 75% of people did *not* screen positive for depression.

**Practical translation:** If you normally have a 3–4% chance of experiencing significant depressive symptoms in any given week, during a major global stressor like a pandemic, that chance might rise to 25%. For an individual running a self-experiment, this means that if you track your mood weekly, you should expect to see your scores rise by about 1 standard deviation or more during high-stress periods.

Limitations

### What the authors acknowledge

High heterogeneity between studies limits the precision of the pooled estimate.

Most studies used convenience sampling, which may not be representative.

Self-report screening tools overestimate clinical depression.

The search was limited to two databases (PubMed and Web of Science) and two languages (English and Spanish), potentially missing relevant studies.

The rapid publication timeline meant that many studies had not undergone rigorous peer review.

### Additional limitations a critical reader would note

**No quality assessment reported in the abstract:** The authors do not state whether they assessed study quality (e.g., using the Newcastle-Ottawa Scale for cross-sectional studies). Low-quality studies could bias the pooled estimate.

**No adjustment for measurement tool:** Different screening tools have different sensitivities and specificities. The pooled estimate does not account for this.

**No longitudinal data:** The 2017 baseline is from a different population, measured with different methods. The comparison is suggestive but not rigorous.

**Cultural confounding:** Depression is expressed and reported differently across cultures. Some populations may be more willing to endorse depressive symptoms on a survey.

**Timing confound:** Studies conducted in January 2020 (when COVID-19 was largely unknown outside China) are very different from studies conducted in May 2020 (when lockdowns were widespread). Pooling them together may obscure important temporal trends.

**No data on pre-pandemic mental health:** None of the studies measured depression in the same individuals before the pandemic. We cannot know whether the same people who screened positive during the pandemic were already depressed beforehand.

Practical takeaways

For someone running their own n=1 experiment to track mood during a major life stressor (pandemic, job loss, relationship breakdown, etc.):

### What to test

**The intervention:** Your own proactive mental health routine. Options include:

- Daily 20-minute outdoor walks (tested in multiple pandemic studies)

- 10-minute morning mindfulness meditation (evidence from meta-analyses shows ~0.3–0.5 SD reduction in depressive symptoms)

- Structured social connection (e.g., one 30-minute video call per day with a friend or family member)

- Sleep hygiene protocol (consistent bedtime, no screens 1 hour before sleep)

**The comparator:** Your own baseline mood before the stressor, or a "no-intervention" period during the stressor.

### Minimum meaningful duration

**At least 4 weeks** to see a reliable change in mood scores. Depression screening tools like the PHQ-9 ask about symptoms over the past 2 weeks, so you need at least 2 measurement points (pre- and post-intervention) spaced 2 weeks apart.

**8 weeks is better** — this allows you to see whether any improvement is sustained or fades over time.

**Track daily** if possible, but analyse weekly averages to smooth out day-to-day noise.

### What to measure

**Primary metric:** PHQ-9 score (0–27). Take it weekly at the same time of day (e.g., Sunday evening). A change of 5 points or more is considered clinically meaningful.

**Secondary metrics:**

- Sleep quality (e.g., sleep onset latency in minutes, total sleep time in hours)

- Physical activity (steps per day or minutes of moderate exercise)

- Social contact (number of meaningful conversations per day)

- Work/productivity (hours of focused work per day)

**Context variables to track daily:**

- News exposure (minutes per day of COVID-19 or stressor-related news)

- Alcohol/caffeine intake

- Time spent outdoors

- Screen time (especially social media)

### Key confounds to control for

**Seasonal affective disorder:** If your experiment runs during winter, mood may decline due to reduced sunlight regardless of the stressor. Control by running the experiment in the same season as your baseline, or by tracking daylight exposure.

**Sleep disruption:** Poor sleep causes depressive symptoms. Track sleep quality separately and consider whether changes in mood are driven by sleep changes.

**Exercise:** Physical activity is a powerful antidepressant. If you increase exercise during your intervention, you won't know whether the mood improvement came from the exercise or the intervention itself. Keep exercise constant, or track it as a separate variable.

**Social support:** People who maintain strong social connections during stress have better mental health outcomes. Track your social contact separately.

**News consumption:** The pandemic studies found that people who consumed more COVID-19 news had higher depression scores. If you change your news habits during your experiment, this could confound your results.

### What a positive result would look like

**PHQ-9 score drops by 5+ points** (e.g., from 15 to 10, moving from moderate to mild depression range).

**Score drops below the clinical cut-off** (PHQ-9 < 10) if you started above it.

**The improvement is sustained for at least 2 consecutive weeks** (not just a single good day).

**The improvement is larger than your typical week-to-week variability.** If your PHQ-9 normally bounces between 8 and 12, a drop to 5 is meaningful. If it bounces between 2 and 8, a drop to 5 might just be normal fluctuation.

**Secondary metrics move in the same direction:** better sleep, more exercise, more social contact, less news consumption.

### Specific recommendation for a self-experiment based on this paper

Given that the pandemic studies found a 7-fold increase in depression prevalence, and that the highest rates were in younger adults, women, and people under strict lockdown, here is a concrete protocol:

1. **Baseline (1 week):** Track your PHQ-9 daily (or weekly), sleep, exercise, social contact, and news consumption. Do not change any habits.

2. **Intervention (4 weeks):** Implement one of the following:

- **Option A:** 20-minute outdoor walk every day, at the same time (e.g., 10:00 AM).

- **Option B:** 10-minute mindfulness meditation using a free app (e.g., Headspace or Insight Timer), immediately after waking.

- **Option C:** One 30-minute video call with a friend or family member every evening.

3. **Washout (1 week):** Return to baseline habits.

4. **Repeat** with a different intervention if desired.

5. **Analyse:** Compare your average PHQ-9 score during the intervention period to your baseline average. Use a simple t-test or look at the overlap in scores (if the lowest intervention week is higher than the highest baseline week, that's a strong signal).

**Watch out for:** The placebo effect. If you believe the

Read full paper →More Mental Health research