Session 1B: Automated Modeling of Item Difficulty on Critical Reading Tests
Session Number
Session 1B: 2nd Presentation
Advisor(s)
Kirk Becker , Pearson VUE
Location
Academic Pit
Start Date
28-4-2017 8:30 AM
End Date
28-4-2017 9:45 AM
Abstract
Previous studies show that several factors may affect the difficulty of reading comprehension questions, often related to their semantic or syntactic properties. However, the accurate prediction of item difficulty is still a challenge for the testing community. With more accurate models, test developers can produce items more efficiently, which will reduce costs, improve accuracy of skills measurement, and free up human labor for more valuable purposes. This study used primarily natural language processing to extract linguistic variables from passages and items of the Law National Aptitude Test (LNAT), and constructed multiple linear regression models and mixed effects models to analyze the significance of these variables. It found that models using only subject matter expert ratings performed equally as well as models using only variables gained through natural language processing, although neither are very accurate. A combination of the two did not improve the model’s accuracy. Regardless, this is promising, because it shows that computer modeling of difficulty is reaching similar performance to models with subject matter expert ratings. It has implications in the testing world, because it suggest that item difficulty prediction could eventually become automatic, which would improve the efficiency of the item development process.
Session 1B: Automated Modeling of Item Difficulty on Critical Reading Tests
Academic Pit
Previous studies show that several factors may affect the difficulty of reading comprehension questions, often related to their semantic or syntactic properties. However, the accurate prediction of item difficulty is still a challenge for the testing community. With more accurate models, test developers can produce items more efficiently, which will reduce costs, improve accuracy of skills measurement, and free up human labor for more valuable purposes. This study used primarily natural language processing to extract linguistic variables from passages and items of the Law National Aptitude Test (LNAT), and constructed multiple linear regression models and mixed effects models to analyze the significance of these variables. It found that models using only subject matter expert ratings performed equally as well as models using only variables gained through natural language processing, although neither are very accurate. A combination of the two did not improve the model’s accuracy. Regardless, this is promising, because it shows that computer modeling of difficulty is reaching similar performance to models with subject matter expert ratings. It has implications in the testing world, because it suggest that item difficulty prediction could eventually become automatic, which would improve the efficiency of the item development process.