Applying System Safety Engineering and Risk Management to LLM•based applications
Session Number
IND STUDY 12
Advisor(s)
Nathan Butters
Discipline
Independent Study
Start Date
17-4-2025 11:25 AM
End Date
17-4-2025 11:40 AM
Abstract
The Math Helper is a Large Language Model (LLM) app that uses a chat interface to help students identify errors in their math work. It takes a picture of an attempted math problem or equation and the person's question as input. The project aims to use the basics of system engineering within a safety context to explore its application in Artificial Intelligence (Al) applications within the project.
We used the System Theoretic Process Analysis (STPA) methodology, a system safety engineering process,
to identify loss scenarios that could result from unsafe control actions and hazards. We turned these into requirements for the project to measure its safety and performance.
We are writing tests to determine the performance of the Math Helper. These tests included a Smoke Test that verifies the Math Helper's basic functionality and stability in responses and a Pass" Test that determines the reliability of LIM responses across multiple attempts, measuring consistency. We intend to use the results to determine if the Math Helper can be safely released to the public. These would be the next steps for this project, and in the future, this STPA methodology can be modified to work with other LLMs and Al agents.
Applying System Safety Engineering and Risk Management to LLM•based applications
The Math Helper is a Large Language Model (LLM) app that uses a chat interface to help students identify errors in their math work. It takes a picture of an attempted math problem or equation and the person's question as input. The project aims to use the basics of system engineering within a safety context to explore its application in Artificial Intelligence (Al) applications within the project.
We used the System Theoretic Process Analysis (STPA) methodology, a system safety engineering process,
to identify loss scenarios that could result from unsafe control actions and hazards. We turned these into requirements for the project to measure its safety and performance.
We are writing tests to determine the performance of the Math Helper. These tests included a Smoke Test that verifies the Math Helper's basic functionality and stability in responses and a Pass" Test that determines the reliability of LIM responses across multiple attempts, measuring consistency. We intend to use the results to determine if the Math Helper can be safely released to the public. These would be the next steps for this project, and in the future, this STPA methodology can be modified to work with other LLMs and Al agents.