Girmaw Abebe Tadesse, Celia Cintas, et al.
ICML 2020
In real-world sequential decision problems, exploration is expensive, and the risk of expert decision policies must be evaluated from limited data. In this setting, Monte Carlo (MC) risk estimators are typically used to estimate the risk of decision policies. Unfortunately, while these estimators have the desired low bias property, they often suffer from large variance. In this paper, we consider the problem of minimizing the asymptotic mean squared error and hence variance of MC risk estimators. We show that by carefully choosing the data sampling policy (behavior policy), we can obtain low variance estimates of the risk of any given decision policy.
Girmaw Abebe Tadesse, Celia Cintas, et al.
ICML 2020
Jihun Yun, Aurelie Lozano, et al.
NeurIPS 2021
Raúl Fernández Díaz, Lam Thanh Hoang, et al.
IRB-AI-DD 2025
Federico Zipoli, Carlo Baldassari, et al.
npj Computational Materials