A Machine Learning Approach to Predict Schizophrenia from SNP-Array Based Genomic Data
Advisor(s)
Dr. Jubao Duan; NorthShore University Health System, Research Institute
Discipline
Computer Science
Start Date
21-4-2021 9:30 AM
End Date
21-4-2021 9:45 AM
Abstract
Although the use of machine learning for disease detection has seen a sharp increase within the past several years, diagnostic methods for mental illnesses such as schizophrenia remain largely qualitative. This project aims to introduce a data-driven diagnosis by using genomic wide array data to predict schizophrenia. Various machine learning models using Python and TensorFlow were run on a dataset of 5334 subjects’ genomes from 17262 loci provided by NorthShore University HealthSystem. A linear dimensional analysis run on the raw data revealed that variables were collinear. Various support vector machine tests were also conducted, and the radial basis function kernel resulted in an average accuracy rate of 72.97%. A convolutional neural network structured as a five-layer sequential model for binary image classification with the adaptive moment estimation optimizer is being altered to further improve accuracy. Currently, a recurrent neural network is being built to understand the efficiency and use of general neural networks. Since a target accuracy rate lies above 95%, future steps include utilizing different parameters and data formats to improve the machine learning pipeline. The future of quantitative mental illness detection remains promising, but more data and a more intricate pipeline are necessary for greater results.
A Machine Learning Approach to Predict Schizophrenia from SNP-Array Based Genomic Data
Although the use of machine learning for disease detection has seen a sharp increase within the past several years, diagnostic methods for mental illnesses such as schizophrenia remain largely qualitative. This project aims to introduce a data-driven diagnosis by using genomic wide array data to predict schizophrenia. Various machine learning models using Python and TensorFlow were run on a dataset of 5334 subjects’ genomes from 17262 loci provided by NorthShore University HealthSystem. A linear dimensional analysis run on the raw data revealed that variables were collinear. Various support vector machine tests were also conducted, and the radial basis function kernel resulted in an average accuracy rate of 72.97%. A convolutional neural network structured as a five-layer sequential model for binary image classification with the adaptive moment estimation optimizer is being altered to further improve accuracy. Currently, a recurrent neural network is being built to understand the efficiency and use of general neural networks. Since a target accuracy rate lies above 95%, future steps include utilizing different parameters and data formats to improve the machine learning pipeline. The future of quantitative mental illness detection remains promising, but more data and a more intricate pipeline are necessary for greater results.