Applying System Safety Engineering and Risk Management to LLM•based applications

Session Number

IND STUDY 12

Advisor(s)

Nathan Butters

Discipline

Independent Study

Start Date

17-4-2025 11:25 AM

End Date

17-4-2025 11:40 AM

Abstract

The Math Helper is a Large Language Model (LLM) app that uses a chat interface to help students identify errors in their math work. It takes a picture of an attempted math problem or equation and the person's question as input. The project aims to use the basics of system engineering within a safety context to explore its application in Artificial Intelligence (Al) applications within the project.

We used the System Theoretic Process Analysis (STPA) methodology, a system safety engineering process,

to identify loss scenarios that could result from unsafe control actions and hazards. We turned these into requirements for the project to measure its safety and performance.

We are writing tests to determine the performance of the Math Helper. These tests included a Smoke Test that verifies the Math Helper's basic functionality and stability in responses and a Pass" Test that determines the reliability of LIM responses across multiple attempts, measuring consistency. We intend to use the results to determine if the Math Helper can be safely released to the public. These would be the next steps for this project, and in the future, this STPA methodology can be modified to work with other LLMs and Al agents.

Share

COinS
 
Apr 17th, 11:25 AM Apr 17th, 11:40 AM

Applying System Safety Engineering and Risk Management to LLM•based applications

The Math Helper is a Large Language Model (LLM) app that uses a chat interface to help students identify errors in their math work. It takes a picture of an attempted math problem or equation and the person's question as input. The project aims to use the basics of system engineering within a safety context to explore its application in Artificial Intelligence (Al) applications within the project.

We used the System Theoretic Process Analysis (STPA) methodology, a system safety engineering process,

to identify loss scenarios that could result from unsafe control actions and hazards. We turned these into requirements for the project to measure its safety and performance.

We are writing tests to determine the performance of the Math Helper. These tests included a Smoke Test that verifies the Math Helper's basic functionality and stability in responses and a Pass" Test that determines the reliability of LIM responses across multiple attempts, measuring consistency. We intend to use the results to determine if the Math Helper can be safely released to the public. These would be the next steps for this project, and in the future, this STPA methodology can be modified to work with other LLMs and Al agents.