Uncertainty-aware deep learning in healthcare: A scoping review

Tyler J Loftus; Benjamin Shickel; Matthew M Ruppert; Jeremy A Balch; Tezcan Ozrazgat-Baslanti; Patrick J Tighe; Philip A Efron; William R Hogan; Parisa Rashidi; Gilbert R Upchurch Jr; Azra Bihorac

doi:10.1371/journal.pdig.0000085

Uncertainty-aware deep learning in healthcare: A scoping review

PLOS Digit Health. 2022;1(8):e0000085. doi: 10.1371/journal.pdig.0000085. Epub 2022 Aug 10.

Authors

Tyler J Loftus^{1

2}, Benjamin Shickel³, Matthew M Ruppert^{2

4}, Jeremy A Balch¹, Tezcan Ozrazgat-Baslanti^{2

4}, Patrick J Tighe⁵, Philip A Efron^{1

2}, William R Hogan⁶, Parisa Rashidi^{2

7}, Gilbert R Upchurch Jr¹, Azra Bihorac^{2

4}

Affiliations

¹ Department of Surgery, University of Florida Health, Gainesville, Florida, United States of America.
² Intelligent Critical Care Center, University of Florida, Gainesville, Florida, United States of America.
³ Department of Biomedical Engineering, University of Florida, Gainesville, Florida, United States of America.
⁴ Department of Medicine, University of Florida Health, Gainesville, Florida, United States of America.
⁵ Departments of Anesthesiology, Orthopedics, and Information Systems/Operations Management, University of Florida Health, Gainesville, Florida, United States of America.
⁶ Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, United States of America.
⁷ Departments of Biomedical Engineering, Computer and Information Science and Engineering, and Electrical and Computer Engineering, University of Florida, Gainesville, Florida, United States of America.

Abstract

Mistrust is a major barrier to implementing deep learning in healthcare settings. Entrustment could be earned by conveying model certainty, or the probability that a given model output is accurate, but the use of uncertainty estimation for deep learning entrustment is largely unexplored, and there is no consensus regarding optimal methods for quantifying uncertainty. Our purpose is to critically evaluate methods for quantifying uncertainty in deep learning for healthcare applications and propose a conceptual framework for specifying certainty of deep learning predictions. We searched Embase, MEDLINE, and PubMed databases for articles relevant to study objectives, complying with PRISMA guidelines, rated study quality using validated tools, and extracted data according to modified CHARMS criteria. Among 30 included studies, 24 described medical imaging applications. All imaging model architectures used convolutional neural networks or a variation thereof. The predominant method for quantifying uncertainty was Monte Carlo dropout, producing predictions from multiple networks for which different neurons have dropped out and measuring variance across the distribution of resulting predictions. Conformal prediction offered similar strong performance in estimating uncertainty, along with ease of interpretation and application not only to deep learning but also to other machine learning approaches. Among the six articles describing non-imaging applications, model architectures and uncertainty estimation methods were heterogeneous, but predictive performance was generally strong, and uncertainty estimation was effective in comparing modeling methods. Overall, the use of model learning curves to quantify epistemic uncertainty (attributable to model parameters) was sparse. Heterogeneity in reporting methods precluded the performance of a meta-analysis. Uncertainty estimation methods have the potential to identify rare but important misclassifications made by deep learning models and compare modeling methods, which could build patient and clinician trust in deep learning applications in healthcare. Efficient maturation of this field will require standardized guidelines for reporting performance and uncertainty metrics.

Abstract

Grants and funding