Diagnostic Performance of Artificial Intelligence and Deep Learning for Diabetic Retinopathy Screening: A Systematic Review and Meta-analysis

<p>Napa Suebsaiphrom*; Thisarin Takkametha; Sittikorn Laojaroenwanit; Nawinda Vanichakulthada  Ussawin Vongkancom</p>

| Open Access

Journal of Optometry and Ophthalmology Research

Volume : 2 Issue : 1

View PDF

Diagnostic Performance of Artificial Intelligence and Deep Learning for Diabetic Retinopathy Screening: A Systematic Review and Meta-analysis
Napa Suebsaiphrom*, Thisarin Takkametha, Sittikorn Laojaroenwanit, Nawinda Vanichakulthada and Ussawin Vongkancom

ABSTRACT

Background: Diabetic retinopathy is a leading cause of preventable visual impairment worldwide, necessitating effective screening programs. Artificial intelligence (AI) and deep learning-based systems have emerged as promising tools for screening using retinal fundus photographs. However, their diagnostic performance varies across populations, algorithms, and screening thresholds.

Objectives: To comprehensively evaluate and quantitatively synthesize the diagnostic performance of AI and deep learning-based systems for diabetic retinopathy screening using retinal fundus photographs.

Methods: A systematic literature search was conducted across PubMed, Embase, Scopus, Web of Science, and IEEE Xplore from inception to December 2024. Studies evaluating AI or deep learning algorithms for detecting diabetic retinopathy using fundus photography with human expert grading as the reference standard were included. Quality assessment was performed using QUADAS-2. Meta-analysis employed bivariate random-effects models.

Results: Forty-two studies comprising 521,568 retinal images were included. For detecting any diabetic retinopathy, pooled sensitivity was 87.5% (95% CI: 85.0–90.0%) and specificity was 84.2% (95% CI: 80.5–87.8%), with an AUROC of 0.92 (95% CI: 0.90–0.94). For referable diabetic retinopathy, pooled sensitivity was 91.8% (95% CI: 89.2–94.3%) and specificity was 87.5% (95% CI: 84.3–90.7%), with an AUROC of 0.95 (95% CI: 0.93–0.97). External validation studies demonstrated lower performance compared to internal validation (AUROC 0.90 vs 0.94). Convolutional neural networks showed the highest diagnostic accuracy among AI architectures.

Conclusions: AI and deep learning systems demonstrate high diagnostic accuracy for diabetic retinopathy screening, approaching human expert performance. These technologies show promise for expanding screening access, particularly in resource-limited settings. However, performance varies by validation setting and population characteristics, highlighting the need for rigorous external validation before clinical implementation.

Cookie Consent

| Open Access

Journal of Optometry and Ophthalmology Research

JOURNAL INDEXING

OASK Publishers Ltd

Quick Links

Contact us