Knowledge Graph Assisted Large Language Models

Session Number

Project ID: CMPS 07

Advisor(s)

Dr. Tarak Nath Nandi, Argonne National Laboratory

Discipline

Computer Science

Start Date

17-4-2024 8:35 AM

End Date

17-4-2024 8:50 AM

Abstract

Transformer-based large language models (LLMs) have gained prominence over the last few years, with their ability to generate human-like content. One of the biggest issues with LLMs is “hallucination” where they generate factually incorrect output in response to queries that don’t have much support from the data that was used to train the model. Previous methods for mitigating hallucinations, such as retrieval augmented generation (RAG) provide direct relationships between entities, leaving out high-level connections. Graph RAG (GRAG) is a technique utilizing knowledge graphs (KGs) to incorporate information from large corpora in a structured format that enables context-based LLM response. For this work, we are using SPOKE, a KG (created by researchers at the University of California San Francisco) constructed of 41 million nodes(entities) and 148 million edges (relationships) representing the interconnected pathways relevant to human biology (e.g., connecting genes, diseases, and drugs). The goal of my project is to utilize SPOKE to provide structured relationships for context-aware and fact-based LLM responses given user queries like, “What are some diseases that involve the BRCA1 gene, and what proteins are affected?”, or “ What are some drugs for someone suffering from Crohn’s disease?”. We anticipate having results to present by IMSAloqium.

Share

COinS
 
Apr 17th, 8:35 AM Apr 17th, 8:50 AM

Knowledge Graph Assisted Large Language Models

Transformer-based large language models (LLMs) have gained prominence over the last few years, with their ability to generate human-like content. One of the biggest issues with LLMs is “hallucination” where they generate factually incorrect output in response to queries that don’t have much support from the data that was used to train the model. Previous methods for mitigating hallucinations, such as retrieval augmented generation (RAG) provide direct relationships between entities, leaving out high-level connections. Graph RAG (GRAG) is a technique utilizing knowledge graphs (KGs) to incorporate information from large corpora in a structured format that enables context-based LLM response. For this work, we are using SPOKE, a KG (created by researchers at the University of California San Francisco) constructed of 41 million nodes(entities) and 148 million edges (relationships) representing the interconnected pathways relevant to human biology (e.g., connecting genes, diseases, and drugs). The goal of my project is to utilize SPOKE to provide structured relationships for context-aware and fact-based LLM responses given user queries like, “What are some diseases that involve the BRCA1 gene, and what proteins are affected?”, or “ What are some drugs for someone suffering from Crohn’s disease?”. We anticipate having results to present by IMSAloqium.