Testing the Type 2 Diabetes Risk Prediction Efficacy of a Synthetically Trained Machine Learning Model

Session Number

CMPS 09

Advisor(s)

Dr. Ravi Madduri

Dr. Alexis Rodriguez, Argonne National Laboratory

Discipline

Computer Science

Start Date

17-4-2024 11:05 AM

End Date

17-4-2024 11:20 AM

Abstract

Several machine learning models trained on electronic health records (EHR) data have been able to predict risk for Type 2 Diabetes accurately, but the efficacy in risk prediction for models trained on synthetic genotype data remains to be tested extensively. Using data gathered from Genome-Wide

Association Studies (GWAS) analyses, we identified several genes correlated with Type 2 Diabetes, each with hundreds of single nucleotide polymorphisms (SNPs). We are currently generating synthetic genotype data based on the GWAS summary statistic results and using it to train a supervised machine learning model. We will compare the accuracy of the risk prediction generated by our synthetically trained model with PrimeT2D, a model trained on EHR data. Synthetically trained models have more accessible data and can thus assist or even replace existing models that predict the risk for Type 2 Diabetes in patients if consistently found to be more accurate.

Share

COinS
 
Apr 17th, 11:05 AM Apr 17th, 11:20 AM

Testing the Type 2 Diabetes Risk Prediction Efficacy of a Synthetically Trained Machine Learning Model

Several machine learning models trained on electronic health records (EHR) data have been able to predict risk for Type 2 Diabetes accurately, but the efficacy in risk prediction for models trained on synthetic genotype data remains to be tested extensively. Using data gathered from Genome-Wide

Association Studies (GWAS) analyses, we identified several genes correlated with Type 2 Diabetes, each with hundreds of single nucleotide polymorphisms (SNPs). We are currently generating synthetic genotype data based on the GWAS summary statistic results and using it to train a supervised machine learning model. We will compare the accuracy of the risk prediction generated by our synthetically trained model with PrimeT2D, a model trained on EHR data. Synthetically trained models have more accessible data and can thus assist or even replace existing models that predict the risk for Type 2 Diabetes in patients if consistently found to be more accurate.