Machine learning prediction of glioblastoma patient one-year survival using clinical and genomic data

Session Number

Project ID: MEDH 32

Loading...

Media is loading
 

Advisor(s)

Dr. Warren McGee; Northwestern University Feinberg School of Medicine, Department of Neurology

Dr. Jane Wu; Northwestern University Feinberg School of Medicine, Department of Neurology

Discipline

Medical and Health Sciences

Start Date

22-4-2020 11:30 AM

End Date

22-4-2020 11:55 AM

Abstract

This study aimed to use machine learning to predict one-year survival for primary glioblastoma (pGBM) patients using data (n = 175) from the Chinese Glioma Genome Atlas (CGGA). Logistic regression (LR), support vector machine (SVM), random forest (RF), and ensemble models were used to select predictors for overall survival (OS) and to classify patients into those surviving less than one year and one year or greater.

With respect to OS, significant (p < 0.05) correlation was found with age (negative), radiotherapy (positive), and chemotherapy (positive). IDH1 mutation and 1p19q codeletion showed insignificant correlation with OS. However, IDH1 mutation showed significant negative correlation with age. Thus, further study may reveal long-term prognostic value.

Correlation analysis was performed on mRNAseq FPKM data to select for significance. LR, SVM, and RF classifiers were compared and combined in a weighted soft-voting ensemble classifier, using weights of 0.125, 0.125, and 0.750, respectively. The ensemble model had the highest accuracy (AUC = 0.654, F1 = 0.799). LR and SVM appeared to underfit the data, while RF appeared to overfit the data. In the ensemble model, the overfitting tendency of RF appeared to be counteracted by the underfitting tendencies of LR and SVM while maintaining high accuracy.

Share

COinS
 
Apr 22nd, 11:30 AM Apr 22nd, 11:55 AM

Machine learning prediction of glioblastoma patient one-year survival using clinical and genomic data

This study aimed to use machine learning to predict one-year survival for primary glioblastoma (pGBM) patients using data (n = 175) from the Chinese Glioma Genome Atlas (CGGA). Logistic regression (LR), support vector machine (SVM), random forest (RF), and ensemble models were used to select predictors for overall survival (OS) and to classify patients into those surviving less than one year and one year or greater.

With respect to OS, significant (p < 0.05) correlation was found with age (negative), radiotherapy (positive), and chemotherapy (positive). IDH1 mutation and 1p19q codeletion showed insignificant correlation with OS. However, IDH1 mutation showed significant negative correlation with age. Thus, further study may reveal long-term prognostic value.

Correlation analysis was performed on mRNAseq FPKM data to select for significance. LR, SVM, and RF classifiers were compared and combined in a weighted soft-voting ensemble classifier, using weights of 0.125, 0.125, and 0.750, respectively. The ensemble model had the highest accuracy (AUC = 0.654, F1 = 0.799). LR and SVM appeared to underfit the data, while RF appeared to overfit the data. In the ensemble model, the overfitting tendency of RF appeared to be counteracted by the underfitting tendencies of LR and SVM while maintaining high accuracy.