Automated Diagnosis of Plus Disease in Retinopathy of Prematurity Using Deep Convolutional Neural Networks

Authors: James M. Brown, J. Peter Campbell, Andrew Beers, Ken Chang, Susan Ostmo, R.V. Paul Chan, Jennifer Dy, Deniz Erdoğmuş, Stratis Ioannidis, Jayashree Kalpathy–Cramer, Michael F. Chiang
Journal: JAMA Ophthalmology
Year: 2018
DOI: 10.1001/jamaophthalmol.2018.1934
Citations: 640

TL;DR

A deep learning algorithm was developed that can diagnose a severe eye condition in premature babies ("plus disease" in Retinopathy of Prematurity) from retinal images with accuracy comparable to or better than experienced human ophthalmologists, demonstrating the powerful potential of artificial intelligence to automate or assist in complex medical diagnostic tasks, which a self-experimenter could explore by comparing AI versus human performance in image-based diagnostic challenges.

What they tested

This study tested the ability of a specialized artificial intelligence (AI) algorithm, specifically a deep convolutional neural network, to accurately diagnose "plus disease" in Retinopathy of Prematurity (ROP) from retinal photographs. Plus disease is characterized by the abnormal dilation (widening) and tortuosity (twisting) of blood vessels in the retina, and its presence is the primary indicator for treating ROP to prevent childhood blindness.

The AI algorithm's performance was compared against two main benchmarks:

1. **Reference Standard Diagnosis (RSD):** This was considered the "ground truth" for each image, established by a consensus of three expert image graders and one clinical expert ophthalmologist. The RSD categorized images into three groups: normal, pre-plus disease, or plus disease.

2. **Human ROP Experts:** The algorithm's diagnostic accuracy was also directly compared to the performance of eight independent, highly experienced ROP experts (ophthalmologists with over 10 years of clinical experience and multiple peer-reviewed publications on ROP).

The primary outcome measures were the algorithm's accuracy in classifying images, quantified by:

**Area Under the Receiver Operating Characteristic Curve (AUC):** A measure of how well the algorithm can distinguish between different diagnostic categories (e.g., plus disease vs. not plus disease). An AUC of 1.0 indicates perfect discrimination, while 0.5 indicates performance no better than random chance.

**Sensitivity:** The proportion of actual positive cases (e.g., images with plus disease) that the algorithm correctly identified. A high sensitivity means fewer false negatives.

**Specificity:** The proportion of actual negative cases (e.g., images without plus disease) that the algorithm correctly identified. A high specificity means fewer false positives.

**Quadratic-weighted κ (Kappa) coefficient:** A statistical measure of inter-rater agreement, used here to assess how well the algorithm's diagnoses agreed with the RSD and with the diagnoses of the human experts, accounting for agreement that would occur by chance. A Kappa value of 1.0 indicates perfect agreement, while 0 indicates agreement equivalent to chance.

Who was studied

The "subjects" of this study were **5511 de-identified retinal photographs** of infants at risk for Retinopathy of Prematurity (ROP). These images were used to train the deep learning algorithm. An additional, separate set of **100 independent retinal photographs** was used to rigorously test the algorithm's performance after it had been trained.

All images were collected from infants participating in the Imaging and Informatics in ROP

Automated Diagnosis of Plus Disease in Retinopathy of Prematurity Using Deep Convolutional Neural Networks

What they tested

Who was studied

Run a structured cold exposure experiment