# Seminar in Psychometrics

Seminar of the COMputational PSychometrics group features local and visiting scholars presenting current research on computational aspects of psychometrics. The talks are approximately 60 minutes long, followed by a discussion. In spring semester, the seminar is jointly held as a course NMST571 at Charles University which usually takes place on Tuesdays from 3:40 PM CET. The seminar is co-hosted by Patrícia Martinková and Jiří Lukavský. If you want to participate and/or be added on a mailing list, please send an e-mail to martinkovaATcs.cas.cz.

### Future sessions

##### December 20, 2022 (2:00 PM CET). David Kaplan (University of Wisconsin – Madison): Probabilistic Forecasting with International Large-Scale Assessments: Applications to the UN Sustainable Development Goals

**Note.** Plenary room (room. 318, second floor) Institute of Computer Science, Pod Vodárenskou věží 2, Prague 8, also on Zoom.

**Abstract.** In 2015, the Member States of the United Nations (UN) adopted the Sustainable Development Goals. With regard to education, the UN identified equitable, high-quality education, including the achievement of literacy and numeracy by all youth and a substantial proportion of adults, both men and women, as one of its global SDGs to be attained by 2030. To analyze education policies such as these, it is critically important to monitor trends in educational outcomes over time. Indeed, as educational systems around the world face new challenges due to the COVID-19 pandemic, monitoring trends in educational outcomes could help identify the long-run impact of this unprecedented health crisis on global education. To this end, international large-scale assessment programs such as PISA are uniquely situated to provide population-level trend data on literacy and numeracy outcomes. The purpose of this talk is to describe a new project in collaboration with the University of Heidelberg and funded by the US Institute of Education Sciences. This project proposes a methodology applicable to international large-scale assessments, and PISA in particular, to monitor and forecast changes in gender equity and to relate changes over time in gender equity to policy-relevant predictors and exogenous events such as the coronavirus pandemic. We utilize a Bayesian workflow to account for uncertainty in all steps in the modeling process, including uncertainty in the parameters of the model as well as model uncertainty in the choice of policy-relevant predictors. A proof-of-concept using data from the United States NAEP program provides a demonstration of the ideas.

**References.**

Kaplan, D., & Huang, M. (2021). Bayesian probabilistic forecasting with large-scale educational trend data: A case study using NAEP. Large-scale Assessments in Education, 9(1), 1-31. https://doi.org/10.1186/s40536-021-00108-2

Kaplan, D., & Jude, N. (2021). Trend analysis with international large-scale assessments: Past practice, current issues, and future directions. In International Handbook of Comparative Large-Scale Studies in Education: Perspectives, Methods and Findings (pp. 1-14). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-88178-8_57

### Past sessions

##### September 27, 2022 (3:40 PM CET). Gabriel Wallin (London School of Economics and Political Science): A Flexible IRT Framework For Latent DIF Detection

**Note.** Plenary room (room. 318, second floor) Institute of Computer Science, Pod Vodárenskou věží 2, Prague 8, also on Zoom.

**Abstract.** The measurement validity of instruments like a questionnaire or a test is established by ascertain that it is
measurement invariant across the items. For this purpose, it is standard procedure to assess the presence of
differential item functioning (DIF), which evaluates measurement invariance on item level. When DIF detection is
not based on manifest groups but on latent groups the problem is typically referred to as latent DIF detection,
which will be the focus of this talk. To that end, I will present a flexible modeling framework that combines a
general latent factor model with a latent class model to capture both normal response behavior under no DIF, and
deviant behavior due to DIF. In the model, a sparse DIF effect parameter is introduced that is allowed to vary
between the latent classes identified by the model. Each item response distribution is consequently modeled as a
function of a latent variable measuring the underlying construct of the questionnaire or test, and of group
membership. No prior knowledge of DIF-free items is required, instead, they are identified through an L_1
penalty on the DIF effect parameter in the marginal likelihood function. An EM algorithm for model estimation is
proposed, where the maximization step is carried out using a quasi-Newton proximal algorithm. Results based on
both simulated and empirical data together with theoretical results will be presented.

##### May 10, 2022 (3:40 PM CET). Irini Moustaki (London School of Economics and Political Science): Detection of two-way outliers in multivariate data and application to cheating detection in educational tests

**Note.** K4 at MFF UK, Sokolovká 83, Prague 8, also on Zoom.

**Abstract.** In the talk we will discuss a latent variable model for the simultaneous (two-way) detection of outlying
individuals and items for item-response-type data. The proposed model is a synergy between a factor model for
binary responses and continuous response times that captures normal item response behaviour and a latent class
model that captures the outlying individuals and items. Covariates are also added to enhance the classification
power of the model. A statistical decision framework is developed under the proposed model that provides
compound decision rules for controlling local false discovery/ nondiscovery rates of outlier detection.
Statistical inference is carried out under a Bayesian framework for which a Markov chain Monte Carlo algorithm
is developed. The proposed method is applied to the detection of cheating in educational tests, due to item
leakage, using a case study of a computer-based nonadaptive licensure assessment. The performance of the
proposed method is evaluated by simulation studies.

**References.**

Yunxiao Chen, Yan Lu, & Irini Moustaki. Detection of two-way outliers in multivariate data and application to cheating detection
in educational tests. Annals of Applied Statistics (In press). arXiv preprint 1911.09408

##### May 3, 2022 (3:40 PM CET). Yves Rosseel (Ghent University): The structural-after-measurement (SAM) approach to structural equation modeling

**Note.** On Zoom and in K4 at MFF UK.

**Abstract.** In structural equation modeling (SEM), the measurement and structural parts of the model are usually estimated
simultaneously. In this presentation, I will revisit the long-standing idea that we should first estimate the
measurement part, and then estimate the structural part. We call this the 'Structural-After-Measurement' (SAM)
approach to SEM. I will describe a formal framework for the SAM approach under settings where the latent
variables and their indicators are continuous. I will also discuss earlier SAM methods and establish how they
are specific instances of the SAM framework. Simulation results will be presented showing several advantages of
the SAM approach: 1) estimates exhibit smaller finite sample biases under correctly specified models, 2)
estimation routines are less vulnerable to convergence issues in small samples, and 3) estimates are more robust
against local model misspecifications. The SAM framework includes two-step corrected standard errors, and
permits computing both local and global fit measures. Finally, for a large class of models, non-iterative
estimators can be used in both stages.

**References.**

Rosseel, Y. & Loh, W. W. (2021). A structural-after-measurement (SAM) approach to SEM. OSF preprint https://osf.io/pekbm/.

##### Apr 19, 2022 (3:40 PM CET). David Magis (IQVIA Belux): Computerized adaptive and multistage testing: overview, challenges and applications

**Note.** Remotely on Zoom, projected to K4 at MFF UK.

**Abstract.** Computerized adaptive testing (CAT) and multistage testing (MST) are two closely connected fields of theory and
applications of psychometrics. They are both a source of intense scientific research and wide areas of
applications for educational measurement and assessment. Though conceptually simple (core of CAT and MST can be
explained in a few sentences), they require a strong underlying measurement theory, an accurate algorithmic
process and a suitable platform for test administration and evaluation. During this talk, I will (a) introduce
the concepts and aspects of CAT and MST; (b) highlight assets, drawbacks and challenges; (c) overview the
current resources for CAT and MST deployment. Some real demonstrations of CAT and MST illustrations using the R
software will also be proposed.

**References.**

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized adaptive and multistage testing with R (using packages catR and
mstR). UseR! series. New York: Springer. doi:10.1007/978-3-319-69218-0

##### Mar 31, 2022 (2:30 PM CET). Michela Battauz (University of Udine): Equating and DIF detection in the IRT framework

**Note.** Remotely on Zoom.

**Abstract.** Differential Item Functioning (DIF) occurs when the probability of a positive response for people at the same
ability level varies in different groups of individuals. The detection of DIF is very important since it
constitutes a violation of the invariance assumption of Item Response Theory (IRT) models. One approach for the
detection of DIF is based on the comparison of the item parameter estimates obtained in different groups of the
population. However, when the item parameters are estimated separately, they are expressed on different
measurement scales, due to identifiability issues. So, it is first necessary to convert the item parameter
estimates to a common scale. This transformation involves two unknown constants, called equating coefficients,
which are estimated from the data. The item parameter estimates converted to a common metric are then compared
through a statistical test. The test implemented in the R package equateIRT takes into account the variability
introduced by the estimation of the equating coefficients, thus improving the properties of the test. In this
talk, the methods for the estimation of the equating coefficients will be reviewed and a test for the detection
of DIF will be presented. The methods will be illustrated using the equateIRT package.

**References.**

Battauz, M. (2019). On Wald tests for differential item functioning detection. Statistical Methods & Applications, 28(1),
103-118.
doi:10.1007/s10260-018-00442-w

Battauz, M. (2015). equateIRT: An R package for IRT test equating. Journal of Statistical Software, 68(1), 1-22.
doi:10.18637/jss.v068.i07

##### Mar 8, 2022 (3:40 PM CET). Dakota Cintron (UC San Francisco): A Latent Dirichlet Allocation Model of Action Patterns

**Note.** Remotely on Zoom, projected to K4, MFF UK.

**Abstract.** Action pattern data are process data often recorded in a computer-based large-scale testing setting and
extracted from log files. The action pattern data portray different actions that test takers use to solve a
given item. This research uses unsupervised and supervised latent Dirichlet allocation (LDA) topic modeling on
action pattern data from a large-scale assessment. Topic modeling, which includes the LDA model, is a machine
learning framework to rapidly discover latent topics from large quantities of open-ended qualitative textual
data quantitatively. In this research, action pattern data from a large-scale assessment are treated as
qualitative textual data to be analyzed with LDA. These latent topics amount to thematic annotations of a
collection of documents referred to as a corpus. For the qualitative action pattern data, the LDA model treats
documents (here a student’s set of action patterns on an item) as being represented by a random mixture over
latent topics where a distribution over words represents each latent topic. As the results of this study
demonstrate, the latent topics derived from the action pattern data can provide helpful insight into different
cognitive processes and key actions that lead to item success or failure. For instance, this research provides
evidence of classes of problem-solving strategies derived from topic distributions of action pattern data and
how these strategies are predictive of item success or failure.

**References.**

Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84.
doi:10.1145/2133806.2133826

Tang, X., et al. (2020). Latent feature extraction for process data via multidimensional scaling. Psychometrika, 85(2), 378-397.
doi:10.1007/s11336-020-09708-3

Cintron, D. W., & Montrosse-Moorhead, B. Integrating Big Data Into Evaluation: R Code for Topic Identification and Modeling.
American Journal of Evaluation (2021).
doi:10.1177/10982140211031640

##### Jan 27, 2022 (1:30 PM CET). Ai Ye (UNC Chapel Hill): Path and Directionality Discovery in Individual Dynamic Models: A Regularized Structural Equation Modeling Approach

**Note.** Jointly held as seminar of ISCB - ČR,
remotely on ZOOM.

**Abstract.** Recent decades have witnessed a surge of psychological and neurological research at an individual level. One
goal in such endeavors is to construct person-specific dynamic assessments using time series data. Within the
psychometric field, researchers have developed psychometric modeling frameworks to estimate time series models.
However, these methods are often limited in the dynamic representations as well as the model selection regimes.
My dissertation research aims to evaluate (Chapter I), reconcile (Chapter II), and extend upon (Chapter III) the
limitations in current practices. In this talk, I will focus on Chapter II, where I proposed a novel modeling
approach that uses regularization under the unified Structure Equation Modeling (uSEM) framework to estimate a
more flexible model, called regularized hybrid uSEM. My simulation study has shown that the proposed approach is
more reliable and accurate than alternative methods in recovering hybrid types of dynamic relations and in
eliminating spurious ones. The present work, to my knowledge, is the first application of the recent regularized
SEM to the estimation of a type of time series SEM, which points to a promising future for statistical learning
in psychometric models.

**References.**

Gates, K. M. & Molenaar, P. C. M. Group search algorithm recovers effective connectivity maps for individuals in homogeneous and
heterogeneous samples. Neuroimage 63, 310–319 (2012).
doi:10.1016/j.neuroimage.2012.06.026

Epskamp, S., Waldorp, L. J., Mõttus, R. & Borsboom, D. The Gaussian Graphical Model in Cross-Sectional and Time-Series Data.
Multivariate Behavioral Research 53, 1–28 (2018).
doi:10.1080/00273171.2018.1454823

Ye, A., Gates, K. M., Henry, T. R. & Luo, L. Path and Directionality Discovery in Individual Dynamic Models: A Regularized
Unified Structural Equation Modeling Approach for Hybrid Vector Autoregression. Psychometrika 86, 404–441 (2021).
doi:10.1007/s11336-021-09753-6

##### May 11, 2021. Ed Merkle (University of Missouri): Recent progress on Bayesian structural equation models

**Abstract.** The talk will be about research and developments surrounding the R package blavaan. Specific topics include
strategies for speeding up model estimation, methods for computing model information criteria, and extensions to
complex models. I will try to discuss the research in the context of open science and reproducibility, which has
been a theme of the software development. I will also provide some demonstrations along the way to illustrate
the functionality of blavaan.

##### May 4, 2021. Jiří Lukavský (Institue of psychology CAS & Charles University): Bayesian psychometrics.

##### April 27, 2021. Gabriel Wallin (Université Côte d'Azur & Inria): Equating nonequivalent test groups using propensity scores

**Abstract.** For standardized assessment tests, scores from different test administrations are comparable only after the
statistical process of equating. In this talk I will discuss equating of test scores when the test groups differ
in their ability distributions. The equating procedures, constructed to only adjust scores due to differences in
difficulty level of the test forms, thus risk to also adjust for the ability differences. The gold standard for
this situation is to utilize a set of common items in the equating procedure. However, not all testing programs
have common items available. This presentation considers this setting. In the absence of common items,
background information about the test-takers will be gathered in a scalar function known as the propensity
score, and the test forms will be equated with respect to this score. This method will be demonstrated using
both empirical and simulated data.

##### April 20, 2021. Marie Wiberg (University of Umea): How to evaluate different equating methods

**Abstract.** Test score equating is used to make scores from one scale comparable with the scores from another scale. There
are a large number of equating methods available depending on how data is collected and what assumptions are
made. The talk starts with a brief overview of available equating methods. As there are a large number of
equating methods developed for different situations and different tests we need tools to evaluate and compare
the different equating transformations. There are a large number of methods and measures proposed to evaluate an
equating transformation. In general they can be divided into two groups; equating specific measures and
statistical measures. In this talk I will discuss several methods and illustrate them with some examples in R.

##### April 13, 2021. Michela Battauz (University of Udine): Item Response Theory Equating Methods for Multiple Forms

**Abstract.** Many testing programs use multiple forms of a test to deal with the security issues connected to test
disclosure. However, since each form is composed of different items, the test scores are not comparable. To
overcome this issue, it is possible to apply the statistical procedure of equating. This talk focuses on Item
Response Theory (IRT) equating methods for the common-item nonequivalent group design. Under this design, the
forms have a set of items in common and they are administered to different groups of examinees. The equating
process consists in the conversion of the item parameter estimates to a common scale using a linear
transformation, and the determination of comparable test scores. The coefficients of this linear function are
called equating coefficients. Despite many testing programs use several forms of a test, the equating methods
proposed in the literature mainly consider only two test forms. In this talk, the equating methods for two test
forms will be reviewed and some newer methods for equating multiple test forms will be presented. The methods
will be illustrated using the R packages equateIRT and equateMultiple.

##### March 30, 2021. František Bartoš (University of Amsterdam & ICS CAS): Robust Bayesian meta-analysis: A framework for addressing publication bias with model-averaging

**Abstract.** Publication bias poses a significant threat to meta-analyses - the gold standard of evidence. To alleviate the
problem, a variety of publication bias adjustment methods was suggested. However, it is nearly impossible to
select the correct method when the data generating process is unknown, which is usually the case, since no
existing method performs well in a wide range of conditions. To address this issue, we developed a Robust
Bayesian meta-analysis (RoBMA) framework. RoBMA allows us to combine different publication bias adjustment
models in a coherent Bayesian way. Apart from obtaining the model-averaged estimates, RoBMA provides Bayes
factor tests for presence or absence of the meta-analytic effect, heterogeneity, and publication bias. In this
talk, I provide a conceptual introduction to Bayesian model-averaging in the context of meta-analyses,
illustrate the RoBMA framework on an applied example, and demonstrate the performance of the method on real and
simulated datasets.