COMPS Group Seminar

Seminar main page


Rethinking Simulation Studies: Living Synthetic Benchmarks for Cumulative Methodological Research

Date and time: Friday, November 21, 2025 (14:00 PM CET)
Place On Zoom (Jointly held as Seminar of ISCB - ČR, see Zoom link).

Abstract:

Simulation studies are widely used to evaluate the performance of statistical methods using synthetic data sets generated from a known ground truth. However, the current methodological research paradigm requires researchers to develop and evaluate new methods at the same time. This creates misaligned incentives, such as the need to demonstrate the superiority of new methods, potentially compromising the neutrality of simulation studies. Furthermore, results of simulation studies are often difficult to compare due to differences in data-generating mechanisms, included methods, and performance measures. This fragmentation can lead to conflicting conclusions, hinder cumulative methodological progress, and delay the adoption of effective methods. To address these challenges, we introduce the concept of living synthetic benchmarks. The key idea is to disentangle method and data-generating mechanism development and continuously update the benchmark whenever a new data-generating mechanism, method, or performance measure becomes available. Such segregation improves the neutrality of method evaluation, puts more focus on the development of both methods and data-generating mechanisms, and makes it possible to compare all methods across all data-generating mechanisms and using all performance measures. In this paper, we (i) outline a blueprint for building and maintaining such benchmarks, (ii) discuss technical and organizational challenges of implementation, and (iii) demonstrate feasibility with a prototype benchmark for publication bias adjustment methods, including an open-source R package. We conclude that living synthetic benchmarks have the potential to foster neutral, reproducible, and cumulative evaluation of methods, benefiting both method developers and users.

References:

Bartoš, F., Pawel, S., & Siepe, B.S. (2025). Living Synthetic Benchmarks: A Neutral and Cumulative Framework for Simulation Studies. Large Foundation Models for Educational Assessment. arXiv:2510.19489

frantisek-bartos
František Bartoš
University of Amsterdam & ICS CAS

https://www.frantisek-bartos.info/