Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial
Read full paper →- Authors
- Kathleen Kara Fitzpatrick, Alison Darcy, Molly Vierhile
- Journal
- JMIR Mental Health
- Year
- 2017
- Citations
- 2,430
TL;DR
Using a conversational AI chatbot (Woebot) for two weeks significantly reduced self-reported depression symptoms in young adults compared to reading an informational ebook, suggesting a promising, accessible tool for self-management of mental health.
What they tested
This study investigated whether a fully automated conversational agent, named Woebot, could effectively deliver self-help content based on Cognitive Behavioral Therapy (CBT) principles to young adults experiencing symptoms of depression and anxiety.
The **intervention** was daily interaction with **Woebot**, a text-based conversational agent delivered via an instant messenger app. Woebot was designed to provide CBT content through brief conversations, mood tracking, short videos, "word games" about cognitive distortions, and therapeutic process-oriented features like empathic responses, tailoring content to mood, goal setting, accountability check-ins, motivation, and weekly mood reflection charts. Participants in this group could engage with Woebot for up to 20 sessions over a two-week period.
The **comparator** was an **information-only control group**. Participants in this group were directed to a free publication from the National Institute of Mental Health (NIMH) titled "Depression in College Students." This ebook provides comprehensive, evidence-based information on depression, its signs, symptoms, and various treatment types. This group served as a baseline to see if the interactive, conversational nature of Woebot offered benefits beyond simply providing static information.
The **outcome measures** were self-reported symptoms of depression and anxiety, as well as general positive and negative affect. These were assessed using standardized, widely recognized questionnaires:
**Depression symptoms:** Measured by the 9-item Patient Health Questionnaire (PHQ-9). This scale ranges from 0 to 27, with higher scores indicating more severe depressive symptoms. Scores of 5, 10, 15, and 20 represent mild, moderate, moderately severe, and severe depression, respectively.
**Anxiety symptoms:** Measured by the 7-item Generalized Anxiety Disorder scale (GAD-7). This scale ranges from 0 to 21, with higher scores indicating more severe anxiety symptoms. Scores of 5, 10, and 15 represent mild, moderate, and severe anxiety, respectively.
**Positive and Negative Affect:** Measured by the Positive and Negative Affect Scale (PANAS). This scale assesses the extent to which a person has experienced positive and negative emotions over a specific period.
The primary objective was to determine the feasibility, acceptability, and preliminary efficacy of Woebot in reducing symptoms of anxiety and depression. The researchers hypothesized that conversation with a therapeutic process-oriented conversational agent would lead to greater improvement in symptoms compared to the information control group, and that the conversational manner would be more acceptable.
Who was studied
The study included **70 individuals** aged 18-28 years, with an average age of 22.2 years (standard deviation, SD 2.33).
**Gender:** 67% were female (47 out of 70 participants).
**Ethnicity/Race:** The majority were non-Hispanic (93%, 54 out of 58 participants for whom data was available) and Caucasian (79%, 46 out of 58 participants for whom data was available).
**Population:** Participants were recruited online from a university community social media site, specifically targeting college students who self-identified as experiencing symptoms of depression and anxiety. This means the study focused on a non-clinical population, i.e., individuals experiencing symptoms but not necessarily formally diagnosed or seeking professional treatment.
**Setting:** The study was conducted entirely online, with recruitment, intervention delivery, and data collection all occurring via web-based platforms.
**Inclusion Criteria:** Participants had to be 18 years or older and able to read English.
**Exclusion Criteria:** Not explicitly stated, but the focus on "self-identified symptoms" suggests that individuals with severe mental health crises or those already in intensive therapy might have self-selected out or been implicitly excluded.
How they measured it
The researchers used standard, self-report questionnaires administered online to assess participants' mental health status at two time points: baseline (before the intervention) and 2-3 weeks later (T2).
**9-item Patient Health Questionnaire (PHQ-9):** This is a self-administered diagnostic and severity tool for depression. Participants respond to nine questions about how often they have been bothered by specific problems over the last two weeks (e.g., "Little interest or pleasure in doing things," "Feeling down, depressed, or hopeless"). Responses are scored from 0 (not at all) to 3 (nearly every day), yielding a total score from 0 to 27. Higher scores indicate more severe depression. A score of 10 or greater is often used as a cutoff for clinically significant depression.
**7-item Generalized Anxiety Disorder scale (GAD-7):** Similar to the PHQ-9, this is a self-administered screening tool for generalized anxiety disorder and a severity measure. Participants respond to seven questions about how often they have been bothered by specific problems over the last two weeks (e.g., "Feeling nervous, anxious, or on edge," "Not being able to stop or control worrying"). Responses are scored from 0 (not at all) to 3 (nearly every day), yielding a total score from 0 to 21. Higher scores indicate more severe anxiety. A score of 10 or greater is often used as a cutoff for clinically significant anxiety.
**Positive and Negative Affect Scale (PANAS):** This scale consists of two mood scales, one for positive affect (e.g., "interested," "excited," "strong") and one for negative affect (e.g., "distressed," "upset," "guilty"). Participants rate the extent to which they have felt each emotion over a specified period (in this study, likely over the past two weeks or "in that moment" for Woebot's daily check-ins, though the abstract refers to baseline and T2 for the full scale). Each item is rated on a 5-point Likert scale, typically from 1 (very slightly or not at all) to 5 (extremely). This provides separate scores for positive and negative affect.
All questionnaires were completed via web-based versions, ensuring consistency in administration across all participants and time points. Participants were offered a prorated incentive of US $10 per completed assessment (US $20 for completion of both assessments) to encourage participation and reduce attrition.
Methodology
This study employed a **Randomized Controlled Trial (RCT)** design, which is considered the gold standard for evaluating the effectiveness of interventions. In an RCT, participants are randomly assigned to either an intervention group or a control group, minimizing bias and allowing researchers to infer a cause-and-effect relationship between the intervention and any observed outcomes.
**Study Design:**
**Parallel-group design:** Participants were assigned to one of two groups (Woebot or information control) and remained in that group for the duration of the study. There were no crossover periods where participants switched groups.
**Duration:** The intervention period lasted for approximately **2 weeks**. Participants completed baseline questionnaires, then engaged with their assigned condition, and completed follow-up questionnaires 2-3 weeks after baseline.
**Recruitment:** Potential participants were recruited online from a university community social media site, targeting students who self-identified as having symptoms of depression and anxiety.
**Randomization:**
**Process:** Confirmed participants were randomized via a **computer algorithm** that automatically generated a number between 0 and 1. Participants with numbers ≤0.5 were allocated to the Woebot group (n=34), and those with numbers >0.5 were allocated to the NIMH ebook control group (n=36).
**Allocation Concealment:** The randomization allocation occurred algorithmically, meaning the researchers did not know which group a participant would be assigned to before they were assigned. This helps prevent selection bias, where researchers might consciously or unconsciously assign certain participants to a particular group.
**Blinding:**
**Unblinded Trial:** This was an **unblinded trial**. Participants were aware of which condition they were in (they knew if they were chatting with Woebot or reading an ebook). The "service providers" (Woebot Labs) were also not masked to the condition. This is a significant limitation, as participants' expectations about the effectiveness of their assigned intervention could influence their self-reported outcomes (known as the placebo effect or expectation bias). It's very difficult to blind participants to an interactive chatbot versus a static ebook.
**Intervention Details (Woebot Group):**
Woebot delivered CBT content in brief, daily conversations and included mood tracking.
It operated within an instant messenger app, accessible on desktop or mobile devices.
Interactions started with inquiries about context and mood, using word or emoji responses.
CBT concepts were presented via short videos or "word games" to teach about cognitive distortions.
The bot included an "onboarding" process, clarifying it was an automated self-help tool, not a replacement for therapy, and that a psychologist was monitoring but not in real-time.
Computational methods involved a decision tree with suggested responses, accepting natural language inputs at specific points for routing conversations. The decision tree structure remained constant.
Therapeutic process-oriented features included:
* **Empathic responses:** Tailored to the participant's inputted mood.
* **Tailoring:** Specific content sent based on mood state (e.g., anxiety-specific assistance).
* **Goal setting:** Asked participants about personal goals for the 2-week period.
* **Accountability:** Set expectations for regular check-ins and followed up on goals/activities.
* **Motivation and engagement:** Sent daily/every-other-day personalized messages to initiate conversation, used emojis and animated GIFs for positive reinforcement.