Equitable machine learning counteracts ancestral bias in precision medicine, improving outcomes for all

Res Sq [Preprint]. 2023 Jul 27:rs.3.rs-3168446. doi: 10.21203/rs.3.rs-3168446/v1.

Abstract

Gold standard genomic datasets severely under-represent non-European populations, leading to inequities and a limited understanding of human disease [1-8]. Therapeutics and outcomes remain hidden because we lack insights that we could gain from analyzing ancestry-unbiased genomic data. To address this significant gap, we present PhyloFrame, the first-ever machine learning method for equitable genomic precision medicine. PhyloFrame corrects for ancestral bias by integrating big data tissue-specific functional interaction networks, global population variation data, and disease-relevant transcriptomic data. Application of PhyloFrame to breast, thyroid, and uterine cancers shows marked improvements in predictive power across all ancestries, less model overfitting, and a higher likelihood of identifying known cancer-related genes. The ability to provide accurate predictions for underrepresented groups, in particular, is substantially increased. These results demonstrate how AI can mitigate ancestral bias in training data and contribute to equitable representation in medical research.

Keywords: ancestry; artificial intelligence; cancer; equitable AI; genomics.

Publication types

  • Preprint