Session Number
F02
Advisor(s)
Namrata Pandya, Illinois Mathematics and Science Academy
Location
B-206 Lecture Hall
Start Date
28-4-2016 9:50 AM
End Date
28-4-2016 10:15 AM
Abstract
Using genome data to predict cancer type is an increasingly relevant practice as it provides a direct, noninvasive strategy to analyze genetic predisposition to malignant cancer types. More specifically, analysis of gene expression data across the genome can provide insight into the underlying gene interactions that propel the progression of tumors. A database containing expression levels for 16,063 genes was split into disjoint training and testing sets; these were subjected to a variety of machine learning methods and statistical analyses, including multinomial logistic regression across cancer phenotypes, k-means clustering analysis, optimization of a predictive support vector machine, and rooted random forest sampling with hidden neural networks. A predictive network was created via these models and was applied to the testing dataset. Primary results indicate a surprising ability of these algorithms to accurately classify cancers. Accuracy of these methods ranged as high as 98.8% with sparse misclassification. Furthermore, an analysis was conducted to determine the genes with the most potential to indicate tumor location as well as the corresponding probabilities for tumorigenic mutations. The results of this investigation demonstrate that machine learning algorithms with random sampling of genes can serve as extraordinarily accurate methods to classify and predict resultant cancers.
Neural Networks and Machine Learning Applied to Classification of Cancer
B-206 Lecture Hall
Using genome data to predict cancer type is an increasingly relevant practice as it provides a direct, noninvasive strategy to analyze genetic predisposition to malignant cancer types. More specifically, analysis of gene expression data across the genome can provide insight into the underlying gene interactions that propel the progression of tumors. A database containing expression levels for 16,063 genes was split into disjoint training and testing sets; these were subjected to a variety of machine learning methods and statistical analyses, including multinomial logistic regression across cancer phenotypes, k-means clustering analysis, optimization of a predictive support vector machine, and rooted random forest sampling with hidden neural networks. A predictive network was created via these models and was applied to the testing dataset. Primary results indicate a surprising ability of these algorithms to accurately classify cancers. Accuracy of these methods ranged as high as 98.8% with sparse misclassification. Furthermore, an analysis was conducted to determine the genes with the most potential to indicate tumor location as well as the corresponding probabilities for tumorigenic mutations. The results of this investigation demonstrate that machine learning algorithms with random sampling of genes can serve as extraordinarily accurate methods to classify and predict resultant cancers.