Conversational Agents in Health Care: Scoping Review and Conceptual Analysis
Read full paper →- Authors
- Lorainne Tudor Car, Dhakshenya Ardhithy Dhinagaran, Bhone Myint Kyaw, Tobias Kowatsch, Shafiq Joty, Yin Leng Theng, Rifat Atun
- Journal
- Journal of Medical Internet Research
- Year
- 2020
- Citations
- 528
TL;DR
This scoping review found that research on conversational agents (chatbots) in healthcare is largely descriptive, focusing on treatment, monitoring, and health service support, with an urgent need for rigorous evaluation of their safety, acceptability, and effectiveness.
What they tested
This study was a scoping review, meaning it systematically mapped out the existing literature on conversational agents (also known as chatbots) in health care. It did not test a specific intervention itself, but rather synthesized what has been studied about these agents.
The review aimed to understand:
**Current applications:** How conversational agents are being used in health care (e.g., for treatment, monitoring, patient education, service support).
**Characteristics of agents:** What types of agents are being developed and studied (e.g., delivery platform like smartphone apps, input/output modalities like text or voice, underlying technology like rule-based or AI-driven).
**Gaps and challenges:** What areas of research are lacking and what methodological issues exist in the current literature.
**Recommendations:** Provide guidance for future research, design, and application of conversational agents in health care.
The review focused on primary research studies that evaluated a conversational agent implemented for a health care–specific purpose. It excluded proposals, opinion pieces, articles without primary research, and those with poorly reported evaluation data. It also specifically excluded "embodied conversational agents (ECAs), relational agents, animated conversational agents, or other conversational agents with a visual or animated component," focusing instead on text- or voice-based agents without a visual avatar.
Who was studied
As a scoping review, this study did not involve human participants directly. Instead, it reviewed existing research studies.
The review identified and analyzed **47 study reports**, which included:
**45 published articles**
**2 ongoing clinical trials**
These studies covered a wide range of populations and health conditions, as the review's scope was broad across all health care applications. The review did not specify the exact number of participants across all included studies, but rather characterized the *types* of studies and the *features* of the conversational agents they described.
How they measured it
The authors conducted a comprehensive literature search across multiple databases and gray literature sources to identify relevant studies. Their measurement process involved:
**Search Strategy:** A broad literature search was performed in April 2019 across major medical and scientific databases:
* MEDLINE (Medical Literature Analysis and Retrieval System Online; Ovid)
* EMBASE (Excerpta Medica database; Ovid)
* PubMed
* Scopus
* Cochrane Central
* Gray literature sources: OCLC WorldCat database, ResearchGate, Google Scholar, OpenGrey, and the first 10 pages of Google search results.
* Reference lists of relevant articles and systematic reviews were also manually checked.
* An extensive list of **63 search terms** was used, including "conversational agents," "conversational AI," "chatbots," and associated synonyms.
**Inclusion Criteria:** Primary research studies that had conducted an evaluation and reported findings on a conversational agent implemented for a health care–specific purpose.
**Exclusion Criteria:**
* Articles that only presented a proposal for conversational agent development.
* Articles that mentioned conversational agents briefly or as an insignificant part of a review.
* Opinion pieces and articles where primary research was not conducted or discussed.
* Articles with poorly reported data on chatbot assessments (minimal or no evaluation data).
* Articles concerning embodied conversational agents (ECAs), relational agents, animated conversational agents, or other conversational agents with a visual or animated component.
**Screening and Data Extraction:** These processes were performed in parallel by **2 independent reviewers** to minimize bias and ensure accuracy. This means two researchers separately assessed each identified article against the inclusion/exclusion criteria and then extracted relevant information, resolving any discrepancies through discussion.
**Data Analysis:** The included evidence was analyzed narratively using the principles of **thematic analysis**. This qualitative method involves identifying, analyzing, and reporting patterns (themes) within the data. In this case, the themes related to the applications, characteristics, and evaluation of conversational agents in health care.
Methodology
This study employed a **scoping review** methodology, which is a type of knowledge synthesis designed to map the breadth of evidence on a particular topic, identify key concepts, types of evidence, and gaps in research. It differs from a systematic review in that it aims for a broad overview rather than a focused answer to a specific question, and it typically does not critically appraise the quality of individual studies.
**How they ran the study:**
The authors followed an updated version of the Arksey and O’Malley framework, a widely recognized guideline for conducting scoping reviews. This involved:
1. **Identifying the research question:** To review current applications, gaps, and challenges of conversational agents in health care.
2. **Identifying relevant studies:** A comprehensive search strategy was executed across multiple databases and gray literature, using a broad range of search terms to capture the nascent and evolving field. This broad search was crucial to ensure a comprehensive mapping of the literature.
3. **Study selection:** Two independent reviewers screened titles, abstracts, and full texts against predefined inclusion and exclusion criteria. This parallel screening process helps to reduce reviewer bias and ensures consistency in study selection.
4. **Charting the data:** Relevant data were extracted from the included studies. This would typically involve information about the conversational agent itself (e.g., technology, delivery platform, modality), its application in health care, the study design, and reported outcomes.
5. **Collating, summarizing, and reporting the results:** The extracted data were analyzed thematically to identify patterns and trends, which were then reported narratively.
**Why this design matters:**
**Mapping a broad field:** A scoping review is ideal for emerging fields like conversational agents in health care, where the research landscape is still developing and diverse. It helps to understand what has been done, how it's been done, and where the gaps are.
**Identifying research gaps:** By providing an overview, this design effectively highlights areas where more rigorous research is needed, such as the call for robust evaluation of effectiveness, acceptability, and safety.
**Informing future research:** The recommendations derived from a scoping review can guide researchers, developers, and policymakers on where to focus their efforts.
**What this design can and cannot prove:**
**Can prove:** This design can effectively describe the *state* of research on conversational agents in health care, including the prevalence of certain applications, technologies, and study types. It can identify trends, common approaches, and significant gaps in the literature. It can also highlight the types of evidence available (e.g., mostly descriptive studies, few RCTs).
**Cannot prove:** A scoping review **cannot prove the effectiveness or safety of any specific conversational agent or intervention**. It does not synthesize quantitative data to provide effect sizes or make direct recommendations for clinical practice. It also typically does not critically appraise the methodological quality of the included studies, meaning it doesn't assess whether the individual studies were well-designed or free from bias. Therefore, while it tells us *what* has been studied, it doesn't tell us *how good* that evidence is in terms of proving efficacy.
**Major methodological weaknesses (inherent to scoping reviews or specific to this one):**
**Lack of quality appraisal:** As noted, scoping reviews generally do not assess the methodological quality or risk of bias of the included studies. This means the review might include studies with weak designs, and their findings are presented without a critical assessment of their reliability.
**Broadness over depth:** While a strength for mapping, the broad scope means that detailed insights into specific interventions or populations might be limited.
**Exclusion of ECAs:** The decision to exclude embodied conversational agents (those with a visual or animated component) means the review provides an incomplete picture of the broader "virtual agent" landscape in health care, potentially missing relevant insights from studies that included visual elements.
**Reliance on published literature:** While gray literature was searched, there's always a risk of publication bias, where studies with positive or significant findings are more likely to be published.
Key findings
The scoping review identified 47 study reports (45 articles and 2 ongoing clinical trials) that met their inclusion criteria. Here are the key findings regarding the characteristics of these studies and the conversational agents they described:
**Prevalence of Study Types:**
* **Case studies describing chatbot development** were the most prevalent type of research, accounting for **18 out of 47** (approximately 38.3%) of the identified reports. This indicates a strong focus on the initial development and description of agents rather than rigorous evaluation.
* Only **11 randomized controlled trials (RCTs)** were identified among the 47 reports (approximately 23.4%). RCTs are considered the gold standard for evaluating intervention effectiveness, highlighting a significant gap in robust efficacy research.
**Delivery Modality of Conversational Agents:**
* The identified conversational agents were largely delivered via