Session 1B: Automated Modeling of Item Difficulty on Critical Reading Tests

Session Number

Session 1B: 2nd Presentation

Advisor(s)

Kirk Becker , Pearson VUE

Location

Academic Pit

Start Date

28-4-2017 8:30 AM

End Date

28-4-2017 9:45 AM

Abstract

Previous studies show that several factors may affect the difficulty of reading comprehension questions, often related to their semantic or syntactic properties. However, the accurate prediction of item difficulty is still a challenge for the testing community. With more accurate models, test developers can produce items more efficiently, which will reduce costs, improve accuracy of skills measurement, and free up human labor for more valuable purposes. This study used primarily natural language processing to extract linguistic variables from passages and items of the Law National Aptitude Test (LNAT), and constructed multiple linear regression models and mixed effects models to analyze the significance of these variables. It found that models using only subject matter expert ratings performed equally as well as models using only variables gained through natural language processing, although neither are very accurate. A combination of the two did not improve the model’s accuracy. Regardless, this is promising, because it shows that computer modeling of difficulty is reaching similar performance to models with subject matter expert ratings. It has implications in the testing world, because it suggest that item difficulty prediction could eventually become automatic, which would improve the efficiency of the item development process.

Share

COinS
 
Apr 28th, 8:30 AM Apr 28th, 9:45 AM

Session 1B: Automated Modeling of Item Difficulty on Critical Reading Tests

Academic Pit

Previous studies show that several factors may affect the difficulty of reading comprehension questions, often related to their semantic or syntactic properties. However, the accurate prediction of item difficulty is still a challenge for the testing community. With more accurate models, test developers can produce items more efficiently, which will reduce costs, improve accuracy of skills measurement, and free up human labor for more valuable purposes. This study used primarily natural language processing to extract linguistic variables from passages and items of the Law National Aptitude Test (LNAT), and constructed multiple linear regression models and mixed effects models to analyze the significance of these variables. It found that models using only subject matter expert ratings performed equally as well as models using only variables gained through natural language processing, although neither are very accurate. A combination of the two did not improve the model’s accuracy. Regardless, this is promising, because it shows that computer modeling of difficulty is reaching similar performance to models with subject matter expert ratings. It has implications in the testing world, because it suggest that item difficulty prediction could eventually become automatic, which would improve the efficiency of the item development process.