Session Number

F02

Advisor(s)

Namrata Pandya, Illinois Mathematics and Science Academy

Location

B-206 Lecture Hall

Start Date

28-4-2016 9:50 AM

End Date

28-4-2016 10:15 AM

Abstract

Using genome data to predict cancer type is an increasingly relevant practice as it provides a direct, noninvasive strategy to analyze genetic predisposition to malignant cancer types. More specifically, analysis of gene expression data across the genome can provide insight into the underlying gene interactions that propel the progression of tumors. A database containing expression levels for 16,063 genes was split into disjoint training and testing sets; these were subjected to a variety of machine learning methods and statistical analyses, including multinomial logistic regression across cancer phenotypes, k-means clustering analysis, optimization of a predictive support vector machine, and rooted random forest sampling with hidden neural networks. A predictive network was created via these models and was applied to the testing dataset. Primary results indicate a surprising ability of these algorithms to accurately classify cancers. Accuracy of these methods ranged as high as 98.8% with sparse misclassification. Furthermore, an analysis was conducted to determine the genes with the most potential to indicate tumor location as well as the corresponding probabilities for tumorigenic mutations. The results of this investigation demonstrate that machine learning algorithms with random sampling of genes can serve as extraordinarily accurate methods to classify and predict resultant cancers.

Share

COinS
 
Apr 28th, 9:50 AM Apr 28th, 10:15 AM

Neural Networks and Machine Learning Applied to Classification of Cancer

B-206 Lecture Hall

Using genome data to predict cancer type is an increasingly relevant practice as it provides a direct, noninvasive strategy to analyze genetic predisposition to malignant cancer types. More specifically, analysis of gene expression data across the genome can provide insight into the underlying gene interactions that propel the progression of tumors. A database containing expression levels for 16,063 genes was split into disjoint training and testing sets; these were subjected to a variety of machine learning methods and statistical analyses, including multinomial logistic regression across cancer phenotypes, k-means clustering analysis, optimization of a predictive support vector machine, and rooted random forest sampling with hidden neural networks. A predictive network was created via these models and was applied to the testing dataset. Primary results indicate a surprising ability of these algorithms to accurately classify cancers. Accuracy of these methods ranged as high as 98.8% with sparse misclassification. Furthermore, an analysis was conducted to determine the genes with the most potential to indicate tumor location as well as the corresponding probabilities for tumorigenic mutations. The results of this investigation demonstrate that machine learning algorithms with random sampling of genes can serve as extraordinarily accurate methods to classify and predict resultant cancers.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.