COMPS Group Seminar

Rethinking Simulation Studies: Living Synthetic Benchmarks for Cumulative Methodological Research

Date and time: Friday, November 21, 2025 (14:00 PM CET)
Place On ZOOM (Jointly held as Seminar of ISCB - ČR, ZOOM Meeting ID: 990 6218 4007, Passcode: 183786).

Abstract:

Simulation studies are widely used to evaluate the performance of statistical methods using synthetic data sets generated from a known ground truth. However, the current methodological research paradigm requires researchers to develop and evaluate new methods at the same time. This creates misaligned incentives, such as the need to demonstrate the superiority of new methods, potentially compromising the neutrality of simulation studies. Furthermore, results of simulation studies are often difficult to compare due to differences in data-generating mechanisms, included methods, and performance measures. This fragmentation can lead to conflicting conclusions, hinder cumulative methodological progress, and delay the adoption of effective methods. To address these challenges, we introduce the concept of living synthetic benchmarks. The key idea is to disentangle method and data-generating mechanism development and continuously update the benchmark whenever a new data-generating mechanism, method, or performance measure becomes available. Such segregation improves the neutrality of method evaluation, puts more focus on the development of both methods and data-generating mechanisms, and makes it possible to compare all methods across all data-generating mechanisms and using all performance measures. In this paper, we (i) outline a blueprint for building and maintaining such benchmarks, (ii) discuss technical and organizational challenges of implementation, and (iii) demonstrate feasibility with a prototype benchmark for publication bias adjustment methods, including an open-source R package. We conclude that living synthetic benchmarks have the potential to foster neutral, reproducible, and cumulative evaluation of methods, benefiting both method developers and users.

References:

Bartoš, F., Pawel, S., & Siepe, B.S. (2025). Living Synthetic Benchmarks: A Neutral and Cumulative Framework for Simulation Studies. arXiv:2510.19489

Pawel, S., Kook, L., & Reeve, K. (2024). Pitfalls and potentials in simulation studies: Questionable research practices in comparative simulation studies allow for spurious claims of superiority of any method. Biometrical Journal, 66(1), 2200091.

Nießl, C., Herrmann, M., Wiedemann, C., Casalicchio, G., & Boulesteix, A. L. (2022). Over‐optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 12(2), e1441.

Heinze, G., Boulesteix, A. L., Kammer, M., Morris, T. P., White, I. R., & Simulation Panel of the STRATOS Initiative. (2024). Phases of methodological research in biostatistics—building the evidence base for new methods. Biometrical Journal, 66(1), 2200222.

František Bartoš
University of Amsterdam & ICS CAS

https://www.frantisek-bartos.info/

František Bartoš is Psychological Methods PhD candidate at the University of Amsterdam. His goal is to improve statistical procedures and enable researchers to draw better inferences from data. He is interested in Bayesian statistics, meta-analyses, publication bias, and replicability. (He also flipped some coins a couple of times.)

COMPS Group Seminar

Rethinking Simulation Studies: Living Synthetic Benchmarks for Cumulative Methodological Research

František Bartoš University of Amsterdam & ICS CAS

František Bartoš
University of Amsterdam & ICS CAS