Seminar in Psychometrics

COMPS seminar main page


Data-Driven Item Difficulty Estimation in Knowledge Tests Using Text Analysis and Machine Learning

Date and time: Monday, March 27, 2023 (4:00 PM CET)
Place ICS CAS room 318, Pod Vodárenskou věží 2, Prague 8, also on Zoom.

Abstract: Test developers face challenges when estimating item difficulty in knowledge tests, particularly when pretest data is not available. In this study, we employ text analysis and machine learning techniques to predict item difficulty in tests of English as a foreign language. Our approach involves natural language processing to extract relevant features from item text, and the use of statistical and machine-learning models to estimate item difficulty. We show that this approach can help to refine estimates made by content experts. The performance of individual features depends on item type and the ability level of the test takers. Machine-learning algorithms vary in their predictive performance of item difficulty but output similar predictions as domain experts. Our findings have important implications for test development and design, and highlight both the potential and limitations of data-driven approaches for predicting item difficulty in knowledge tests.

References.
Settles, B., T LaFlair, G., & Hagiwara, M. (2020). Machine learning–driven language assessment. Transactions of the Association for Computational Linguistics, 8, 247–263. https://doi.org/10.1162/tacl_a_00310
Alkhuzaey, A., & Tendeiro, J. N. (2020). A systematic review of data-driven approaches to item difficulty prediction. Journal of Educational Measurement, 57(2), 263–280. https://doi.org/10.1111/jedm.12236
Belov, D. I. (2022). Predicting Item Characteristic Curve (ICC) Using a Softmax Classifier. In: Wiberg, M., Molenaar, D., González, J., Kim, JS., Hwang, H. (eds) Quantitative Psychology. IMPS 2021. Springer Proceedings in Mathematics & Statistics, vol. 393. Springer, Cham. https://doi.org/10.1007/978-3-031-04572-1_13

anonymous anonymous
Jana Dlouhá
ICS CAS & Charles University, Prague

Jana Dlouhá is a doctoral fellow at the Institute of Computer Science of the Czech Academy of Sciences and a PhD student at the Faculty of Arts, Charles University. Jana's research focuses on computerized adaptive testing and computational psychometrics, combining her backgrounds in psychology and ICT. With this interdisciplinary skillset, Jana works with machine learning methods for psychometric applications and addresses complex problems at the intersection of these fields. In addition to her research, Jana is actively involved in teaching statistics, psychometrics, and R courses for bachelor and master study programs. She also participates in working group for student selection for the psychology program.

Lubomír Štěpánek
ICS CAS, Prague

Lubomír Štěpánek is a doctoral fellow at the Institute of Computer Science of the Czech Academy of Sciences and an assistant lecturer and Ph.D. student at the First Faculty of Medicine, Charles University, and at the Faculty of Informatics and Statistics, Prague University of Economics and Business. His main interests are non-conventional approaches to survival analysis, enrichment of traditional survival models by machine learning, computational psychometrics, and nonparametric statistics in general. For his research in survival analysis, he was awarded the prestigious Hlávka award for talented young scientists. Lubomír is an R and LaTeX enthusiast and he teaches statistics, introductory maths, and R programming courses for bachelor and master study programs.