Knowledge Graph Assisted Large Language Models
Session Number
Project ID: CMPS 07
Advisor(s)
Dr. Tarak Nath Nandi, Argonne National Laboratory
Discipline
Computer Science
Start Date
17-4-2024 8:35 AM
End Date
17-4-2024 8:50 AM
Abstract
Transformer-based large language models (LLMs) have gained prominence over the last few years, with their ability to generate human-like content. One of the biggest issues with LLMs is “hallucination” where they generate factually incorrect output in response to queries that don’t have much support from the data that was used to train the model. Previous methods for mitigating hallucinations, such as retrieval augmented generation (RAG) provide direct relationships between entities, leaving out high-level connections. Graph RAG (GRAG) is a technique utilizing knowledge graphs (KGs) to incorporate information from large corpora in a structured format that enables context-based LLM response. For this work, we are using SPOKE, a KG (created by researchers at the University of California San Francisco) constructed of 41 million nodes(entities) and 148 million edges (relationships) representing the interconnected pathways relevant to human biology (e.g., connecting genes, diseases, and drugs). The goal of my project is to utilize SPOKE to provide structured relationships for context-aware and fact-based LLM responses given user queries like, “What are some diseases that involve the BRCA1 gene, and what proteins are affected?”, or “ What are some drugs for someone suffering from Crohn’s disease?”. We anticipate having results to present by IMSAloqium.
Knowledge Graph Assisted Large Language Models
Transformer-based large language models (LLMs) have gained prominence over the last few years, with their ability to generate human-like content. One of the biggest issues with LLMs is “hallucination” where they generate factually incorrect output in response to queries that don’t have much support from the data that was used to train the model. Previous methods for mitigating hallucinations, such as retrieval augmented generation (RAG) provide direct relationships between entities, leaving out high-level connections. Graph RAG (GRAG) is a technique utilizing knowledge graphs (KGs) to incorporate information from large corpora in a structured format that enables context-based LLM response. For this work, we are using SPOKE, a KG (created by researchers at the University of California San Francisco) constructed of 41 million nodes(entities) and 148 million edges (relationships) representing the interconnected pathways relevant to human biology (e.g., connecting genes, diseases, and drugs). The goal of my project is to utilize SPOKE to provide structured relationships for context-aware and fact-based LLM responses given user queries like, “What are some diseases that involve the BRCA1 gene, and what proteins are affected?”, or “ What are some drugs for someone suffering from Crohn’s disease?”. We anticipate having results to present by IMSAloqium.