A Machine Learning Approach to Predict Schizophrenia from SNP-Array Based Genomic Data

Advisor(s)

Dr. Jubao Duan; NorthShore University Health System, Research Institute

Discipline

Computer Science

Start Date

21-4-2021 9:30 AM

End Date

21-4-2021 9:45 AM

Abstract

Although the use of machine learning for disease detection has seen a sharp increase within the past several years, diagnostic methods for mental illnesses such as schizophrenia remain largely qualitative. This project aims to introduce a data-driven diagnosis by using genomic wide array data to predict schizophrenia. Various machine learning models using Python and TensorFlow were run on a dataset of 5334 subjects’ genomes from 17262 loci provided by NorthShore University HealthSystem. A linear dimensional analysis run on the raw data revealed that variables were collinear. Various support vector machine tests were also conducted, and the radial basis function kernel resulted in an average accuracy rate of 72.97%. A convolutional neural network structured as a five-layer sequential model for binary image classification with the adaptive moment estimation optimizer is being altered to further improve accuracy. Currently, a recurrent neural network is being built to understand the efficiency and use of general neural networks. Since a target accuracy rate lies above 95%, future steps include utilizing different parameters and data formats to improve the machine learning pipeline. The future of quantitative mental illness detection remains promising, but more data and a more intricate pipeline are necessary for greater results.

Share

COinS
 
Apr 21st, 9:30 AM Apr 21st, 9:45 AM

A Machine Learning Approach to Predict Schizophrenia from SNP-Array Based Genomic Data

Although the use of machine learning for disease detection has seen a sharp increase within the past several years, diagnostic methods for mental illnesses such as schizophrenia remain largely qualitative. This project aims to introduce a data-driven diagnosis by using genomic wide array data to predict schizophrenia. Various machine learning models using Python and TensorFlow were run on a dataset of 5334 subjects’ genomes from 17262 loci provided by NorthShore University HealthSystem. A linear dimensional analysis run on the raw data revealed that variables were collinear. Various support vector machine tests were also conducted, and the radial basis function kernel resulted in an average accuracy rate of 72.97%. A convolutional neural network structured as a five-layer sequential model for binary image classification with the adaptive moment estimation optimizer is being altered to further improve accuracy. Currently, a recurrent neural network is being built to understand the efficiency and use of general neural networks. Since a target accuracy rate lies above 95%, future steps include utilizing different parameters and data formats to improve the machine learning pipeline. The future of quantitative mental illness detection remains promising, but more data and a more intricate pipeline are necessary for greater results.