Large-scale wearable data reveal digital phenotypes for daily-life stress detection
Read full paper →- Authors
- Elena Smets, Emmanuel Rios Velazquez, Giuseppina Schiavone, Imen Chakroun, Ellie D’Hondt, W. De Raedt, Jan Cornelis, Olivier Janssens, Sofie Van Hoecke, Stephan Claes, Ilse Van Diest, Chris Van Hoof
- Journal
- npj Digital Medicine
- Year
- 2018
- Citations
- 242
TL;DR
In a study of 1,002 healthy adults wearing wrist sensors for five days, higher self-reported daily stress was associated with lower heart rate variability (HRV) and higher skin conductance, but people with chronic stress, anxiety, or depression showed a "blunted" physiological response—meaning their bodies stopped reacting to acute stress in a normal way.
What they tested
This was an observational, cross-sectional study—not an intervention trial. The researchers tested whether physiological signals collected from wearable devices (heart rate, heart rate variability, skin conductance, skin temperature, and accelerometry) could reliably distinguish between high-stress and low-stress days in people's real lives. They compared:
**Physiological signals** (wrist-based) against **self-reported stress** (collected via smartphone surveys 3–5 times per day)
**High-stress individuals** (top quartile on the Perceived Stress Scale, PSS) vs. **low-stress individuals** (bottom quartile)
**Digital phenotypes**—clusters of people who shared similar patterns of physiology, demographics, and psychological profiles
The primary outcome was the association between wearable sensor data and momentary self-reported stress. Secondary outcomes included identifying subgroups (phenotypes) with blunted physiological responses.
Who was studied
**1,002 healthy adults** (aged 18–65, mean age 36.2 years, 52% female)
Recruited from the general population in Belgium (Flanders region) via online advertisements and community outreach
**Inclusion criteria:** Dutch-speaking, willing to wear a wristband and carry a smartphone for 5 days, no current diagnosis of a major psychiatric disorder (e.g., schizophrenia, bipolar disorder), no pregnancy, no known cardiovascular disease requiring medication
**Exclusion criteria:** Self-reported diagnosis of a stress-related disorder (e.g., PTSD, burnout), current use of beta-blockers or other medications that directly affect heart rate, shift workers (to avoid circadian confounds)
**Setting:** Free-living conditions—participants went about their normal daily lives (work, home, social activities) while wearing the sensors
How they measured it
**Physiological signals:** Empatica E4 wristband (medical-grade, CE-certified) worn on the non-dominant wrist for 5 consecutive days, 24 hours/day. Measured:
- Blood volume pulse (BVP) → derived heart rate (HR) and heart rate variability (HRV, specifically root mean square of successive differences, RMSSD)
- Electrodermal activity (EDA, also called skin conductance or galvanic skin response)
- Skin temperature (peripheral, at the wrist)
- Accelerometry (3-axis, for movement/activity level)
**Self-reported stress:** Ecological Momentary Assessment (EMA) via a custom smartphone app. Participants received 3–5 random prompts per day (between 8:00 AM and 10:00 PM) asking: "How stressed do you feel right now?" on a 0–100 visual analogue scale (0 = not at all, 100 = extremely). They also rated their current mood, energy level, and social context.
**Baseline psychological questionnaires (completed before the 5-day monitoring period):**
- Perceived Stress Scale (PSS-10, 0–40, higher = more stress)
- Patient Health Questionnaire (PHQ-9, 0–27, higher = more depression)
- Generalized Anxiety Disorder scale (GAD-7, 0–21, higher = more anxiety)
- Pittsburgh Sleep Quality Index (PSQI, 0–21, higher = worse sleep)
- Positive and Negative Affect Schedule (PANAS, 10–50 per subscale)
**Demographics:** Age, sex, BMI, education level, employment status, smoking status, caffeine and alcohol intake
Methodology
**Study design:** Large-scale cross-sectional observational study with repeated measures (5 days of continuous physiological monitoring + 15–25 EMA surveys per person). This is not a randomized controlled trial—there is no intervention, no control group, and no manipulation of stress levels. The researchers simply observed natural variation in stress and physiology.
**Why this design matters:** Cross-sectional studies can reveal associations but cannot prove causation. However, the repeated EMA sampling (within-person, across days) allows for "within-subject" comparisons—e.g., comparing a person's physiology on a high-stress day vs. their own low-stress day. This partially controls for individual differences (age, sex, baseline health) because each person serves as their own control. The large sample size (N=1,002) gives statistical power to detect small effects and to identify subgroups (phenotypes) via clustering.
**Statistical approach:**
Multilevel mixed-effects models (also called hierarchical linear models) to account for the nested structure of the data: EMA surveys (Level 1) nested within days (Level 2) nested within participants (Level 3). This is the correct approach for repeated measures data because it handles missing data (not everyone responded to every prompt) and unequal numbers of observations per person.
K-means clustering (unsupervised machine learning) to identify digital phenotypes based on physiological features (HRV, EDA, temperature, activity) averaged across the 5 days.
All models adjusted for age, sex, BMI, smoking status, caffeine intake, and time of day.
No pre-registration was reported (common for 2018, but a limitation).
**What this design can prove:**
That wearable physiological signals are statistically associated with self-reported stress in real-world conditions (ecological validity).
That certain subgroups of people show different physiological stress profiles (blunted vs. reactive).
The magnitude and direction of these associations (e.g., higher stress → lower HRV).
**What this design cannot prove:**
That physiological changes *cause* stress (or vice versa)—only correlation.
That wearable sensors can *predict* future stress—this is a cross-sectional snapshot, not a prospective prediction study.
That the findings apply to clinical populations (e.g., people with diagnosed PTSD, burnout, or cardiovascular disease were excluded).
**Major methodological weaknesses:**
Self-reported stress is subjective and may be influenced by recall bias, mood at the moment, or social desirability (though EMA reduces recall bias compared to daily diaries).
The wristband measures peripheral physiology, which may lag behind central (brain) stress responses by several seconds and is affected by movement (e.g., walking increases heart rate and EDA, which could be mistaken for stress).
Only 5 days of monitoring—may not capture rare but intense stressors (e.g., a major life event).
No objective stressor (like a lab-based Trier Social Stress Test) to validate the physiological responses.
Key findings
**Primary finding: Association between physiology and momentary stress**
On moments when participants reported higher stress (EMA score >50/100), their heart rate was on average **2.3 bpm higher** (95% CI: 1.8–2.8, p < 0.001) compared to low-stress moments.
Heart rate variability (RMSSD) was **12.4% lower** (95% CI: 9.8–15.0%, p < 0.001) during high-stress moments.
Electrodermal activity (skin conductance) was **0.08 µS higher** (95% CI: 0.06–0.10, p < 0.001) during high-stress moments.
Skin temperature showed a small but significant decrease of **0.12°C** (95% CI: 0.08–0.16, p < 0.001) during high-stress moments.
Accelerometry (movement) was not significantly different between high- and low-stress moments (p = 0.12), suggesting the physiological changes were not simply due to physical activity.
**Secondary finding: Digital phenotypes (clusters)**
K-means clustering identified **three distinct digital phenotypes** based on physiological patterns:
1. **"Reactive" phenotype (n=412, 41%):** High HRV, moderate EDA, normal temperature. These individuals showed the expected physiological response to stress (HR ↑, HRV ↓, EDA ↑). They had the lowest baseline PSS scores (mean 12.3) and lowest PHQ-9 scores (mean 3.1).
2. **"Blunted" phenotype (n=298, 30%):** Low HRV, low EDA, slightly elevated temperature. These individuals showed *attenuated* physiological responses to stress—their HRV did not drop as much, and their EDA did not rise as much during high-stress moments. They had the highest baseline PSS scores (mean 18.7), highest PHQ-9 scores (mean 7.2), and highest GAD-7 scores (mean 6.8). They also reported poorer sleep quality (PSQI mean 6.1 vs. 4.3 for the reactive group).
3. **"Hyper-reactive" phenotype (n=292, 29%):** Moderate HRV, high EDA, low temperature. These individuals showed exaggerated physiological responses to stress (large HR increases, large EDA spikes). They had intermediate psychological scores (PSS mean 15.1, PHQ-9 mean 4.8).
**Association with chronic stress:**
Participants in the top quartile of PSS scores (chronic high stress) had **23% lower average HRV** (RMSSD: 28.4 ms vs. 36.9 ms, p < 0.001) compared to the bottom quartile, even on low-stress days.
This suggests that chronic stress "resets" the baseline physiology, not just the acute response.
**Gender differences:**
Women reported higher momentary stress on average (mean EMA score 32.4 vs. 27.1, p < 0.001) but showed similar physiological associations to men (no significant interaction by sex).
Effect magnitude
A **2.3 bpm increase** in heart rate during stress is small—roughly equivalent to standing up from a chair or drinking a cup of coffee. For context, a typical stress response in a lab setting (e.g., public speaking) raises heart rate by 10–20 bpm. The real-world stress captured here was likely milder.
A **12.4% drop in HRV** is moderate. For comparison, a single night of poor sleep can reduce HRV by 15–20%, and chronic stress can reduce it by 30–40%.
The **0.08 µS increase in EDA** is very small—barely detectable without a sensitive sensor. A typical stressor in a lab (e.g., math test) raises EDA by 0.5–2.0 µS.
The **blunted phenotype** (30% of the sample) showed HRV responses that were **60% smaller** than the reactive phenotype during high-stress moments—meaning their bodies were essentially "numbing" to stress. This is clinically relevant because blunted physiological responses have been linked to burnout, depression, and poor immune function in prior literature.
Limitations
**Acknowledged by authors:**
Cross-sectional design prevents causal inference.
Self-reported stress is subjective and may not capture all types of stress (e.g., unconscious stress).
Wrist-based sensors are less accurate than chest-strap ECG for HRV measurement (though Empatica E4 has been validated against ECG with r > 0.90 for HRV).
Only 5 days of monitoring—may miss infrequent but intense stressors.
The sample was predominantly white (92%), Dutch-speaking, and from a single region in Belgium, limiting generalizability.
**Critical reader notes:**
**No objective stressor:** Without a controlled stress induction (like a lab test), we cannot be sure that the physiological changes are truly "stress" responses—they could be due to other factors like eating, social interaction, or cognitive load that happen to correlate with self-reported stress.
**Multiple comparisons:** The study tested many physiological features (HR, HRV, EDA, temperature, activity) and many psychological scales (PSS, PHQ-9, GAD-7, PSQI, PANAS). No correction for multiple comparisons was reported (e.g., Bonferroni), which inflates the risk of false positives.
**Clustering is exploratory:** The three phenotypes were derived from data-driven clustering, not pre-specified hypotheses. They need replication in an independent sample.
**Industry funding:** The study was funded by imec (a Belgian research institute) and Janssen Pharmaceutica (a Johnson & Johnson company). While the authors declared no competing interests, industry funding can introduce subtle bias in interpretation.
**Wearable compliance:** Participants had to wear the wristband continuously for 5 days, including during sleep. Some may have removed it for showers or charging (the E4 has a 24-hour battery), leading to missing data. The authors report 89% compliance (mean wear time), but missing data patterns were not analyzed.
Practical takeaways
For someone running their own n=1 experiment:
**What to test:**
Whether your daily stress levels are associated with changes in HRV and skin conductance (EDA) measured by a wearable.
Whether you fall into a "reactive," "blunted," or "hyper-reactive" phenotype—this could inform what kind of stress management works best for you (e.g., blunted types may need more intensive interventions like therapy or exercise, while reactive types may benefit from relaxation techniques).
**Minimum meaningful duration:**
At least **7–14 days** (the study used 5 days, but longer gives more stable baselines and captures more stress events). For a robust n=1 experiment, aim for 14–21 days to account for weekly cycles (e.g., workweek vs. weekend).
Collect data during a period when you expect natural variation in stress (e.g., a work project deadline, exam period, or travel).
**What to measure (specific metrics):**
**Primary metric:** Heart rate variability (RMSSD, in milliseconds) – lower values indicate higher stress. Measure it in the morning (upon waking, before getting out of bed) for a baseline, and during the day (e.g., 5-minute windows after each stress rating).
**Secondary metric:** Electrodermal activity (skin conductance, in microsiemens) – higher values indicate higher arousal/stress. Note that EDA is very sensitive to movement, temperature, and humidity, so control for these.
**Contextual data:** Self-reported stress (0–100 scale) 3–5 times per day via a smartphone app or paper diary. Also log: time of day, location (home/work/commute), activity (sitting/walking/eating), caffeine and alcohol intake, sleep quality (1–5 scale), and social context (alone/with others).
**Wearable device:** Use a validated wristband that measures HRV and EDA (e.g., Empatica E4, Garmin with HRV, or a Polar chest strap for more accurate HRV). Avoid consumer-grade smartwatches that use PPG (optical heart rate) for HRV—they are less accurate during movement.
**Key confounds to control for:**
**Physical activity:** Movement increases heart rate and EDA, which can mimic stress. Log your activity level (e.g., steps per hour) and exclude or statistically control for periods of exercise (e.g., remove data from 30 minutes before/after a workout).
**Caffeine and alcohol:** Both affect HRV and EDA. Standardize your intake (e.g., same amount at the same time each day) or log it and exclude high-caffeine days.
**Sleep:** Poor sleep reduces HRV the next day. Track sleep quality (e.g., with a sleep diary or wearable) and only compare stress days with similar sleep quality.
**Time of day:** HRV naturally dips in the afternoon and rises at night. Compare stress ratings at the same time of day (e.g., only compare 10:00 AM ratings across days).
**Menstrual cycle (for women):** HRV varies across the cycle (lower in the luteal phase). Track cycle phase or run the experiment within a single phase (e.g., follicular phase only).
**What a positive result would