Testing the Type 2 Diabetes Risk Prediction Efficacy of a Synthetically Trained Machine Learning Model
Session Number
CMPS 09
Advisor(s)
Dr. Ravi Madduri
Dr. Alexis Rodriguez, Argonne National Laboratory
Discipline
Computer Science
Start Date
17-4-2024 11:05 AM
End Date
17-4-2024 11:20 AM
Abstract
Several machine learning models trained on electronic health records (EHR) data have been able to predict risk for Type 2 Diabetes accurately, but the efficacy in risk prediction for models trained on synthetic genotype data remains to be tested extensively. Using data gathered from Genome-Wide
Association Studies (GWAS) analyses, we identified several genes correlated with Type 2 Diabetes, each with hundreds of single nucleotide polymorphisms (SNPs). We are currently generating synthetic genotype data based on the GWAS summary statistic results and using it to train a supervised machine learning model. We will compare the accuracy of the risk prediction generated by our synthetically trained model with PrimeT2D, a model trained on EHR data. Synthetically trained models have more accessible data and can thus assist or even replace existing models that predict the risk for Type 2 Diabetes in patients if consistently found to be more accurate.
Testing the Type 2 Diabetes Risk Prediction Efficacy of a Synthetically Trained Machine Learning Model
Several machine learning models trained on electronic health records (EHR) data have been able to predict risk for Type 2 Diabetes accurately, but the efficacy in risk prediction for models trained on synthetic genotype data remains to be tested extensively. Using data gathered from Genome-Wide
Association Studies (GWAS) analyses, we identified several genes correlated with Type 2 Diabetes, each with hundreds of single nucleotide polymorphisms (SNPs). We are currently generating synthetic genotype data based on the GWAS summary statistic results and using it to train a supervised machine learning model. We will compare the accuracy of the risk prediction generated by our synthetically trained model with PrimeT2D, a model trained on EHR data. Synthetically trained models have more accessible data and can thus assist or even replace existing models that predict the risk for Type 2 Diabetes in patients if consistently found to be more accurate.