Assessing the Performance of Automated Human Phenotype Ontology (HPO) Term Extraction for Deep Phenotyping of Patients Receiving Whole Genome/Whole Exome Sequencing in a Clinical Diagnostic Laboratory

Session Number

Project ID: MEDH 10

Advisor(s)

Dr. Kai Lee Yap; Ann & Robert H. Lurie Children's Outpatient Center

Discipline

Medical and Health Sciences

Start Date

19-4-2023 8:50 AM

End Date

19-4-2023 9:05 AM

Abstract

A comprehensive list of Human Phenotype Ontology (HPO) terms capturing a patient’s phenotypic features is essential for the creation of a prioritized gene list for whole exome and whole genome sequencing (WES/WGS) analysis. However, the conversion of a patient’s clinical notes into HPO terms is inefficient, requiring human intervention, and introduces subjectivity. In this study, we evaluated the performance of various Natural Language Processing (NLP) and gene-ranking algorithms that partially automate the identification of HPO terms from rich narrative notes and accomplish deep phenotyping. manually curated a set of 50 patients who underwent WES with resultant disease-causing variants. These clinical notes were processed in the EHR-Phenolyzer pipeline, which utilizes HPO terms extracted by MetaMap to generate a ranked gene list on the Phenolyzer algorithm. The accuracy of the extracted HPO terms were compared to provider submitted terms. A cross comparison was also made with ClinPhen-Phen2Gene workflow. The EHR-Phenolyzer pipeline ranked genes with disease-causing variants at positions <250th in 38% of the WES cases (19/50), as compared to 30% of cases (15/50) using MetaMap-Phen2Gene. ClinPhen-Phen2Gene workflow demonstrated comparable performance. Adoption of NLP-assisted deep phenotyping and gene-ranking is critical to minimize the variable effects of human recall bias.

Share

COinS
 
Apr 19th, 8:50 AM Apr 19th, 9:05 AM

Assessing the Performance of Automated Human Phenotype Ontology (HPO) Term Extraction for Deep Phenotyping of Patients Receiving Whole Genome/Whole Exome Sequencing in a Clinical Diagnostic Laboratory

A comprehensive list of Human Phenotype Ontology (HPO) terms capturing a patient’s phenotypic features is essential for the creation of a prioritized gene list for whole exome and whole genome sequencing (WES/WGS) analysis. However, the conversion of a patient’s clinical notes into HPO terms is inefficient, requiring human intervention, and introduces subjectivity. In this study, we evaluated the performance of various Natural Language Processing (NLP) and gene-ranking algorithms that partially automate the identification of HPO terms from rich narrative notes and accomplish deep phenotyping. manually curated a set of 50 patients who underwent WES with resultant disease-causing variants. These clinical notes were processed in the EHR-Phenolyzer pipeline, which utilizes HPO terms extracted by MetaMap to generate a ranked gene list on the Phenolyzer algorithm. The accuracy of the extracted HPO terms were compared to provider submitted terms. A cross comparison was also made with ClinPhen-Phen2Gene workflow. The EHR-Phenolyzer pipeline ranked genes with disease-causing variants at positions <250th in>38% of the WES cases (19/50), as compared to 30% of cases (15/50) using MetaMap-Phen2Gene. ClinPhen-Phen2Gene workflow demonstrated comparable performance. Adoption of NLP-assisted deep phenotyping and gene-ranking is critical to minimize the variable effects of human recall bias.