Reports on the 2015 AAAI Workshop Series
Stefano V. Albrecht, J. Christopher Beck, et al.
AAAI 2015
Monte-Carlo Tree Search (MCTS) combined with Multi Armed Bandit (MAB) has had limited success in domain independent classical planning until recently. Previous work (Wissow and Asai 2023) showed that UCB1, designed for bounded rewards, does not perform well when applied to the cost-to-go estimates of classical planning, which are unbounded in R, then improved the performance by using a Gaussian reward MAB instead. We further sharpen our understanding of ideal bandits for planning tasks by resolving three issues: First, Gaussian MABs under-specify the support of cost-to-go estimates as [−∞, ∞]. Second, Full-Bellman backup that backpropagates max/min of samples lacks theoretical justifications. Third, removing dead-ends lacks justifications in Monte-Carlo backup. We use Extreme Value Theory Type 2 to resolve them at once, propose two bandits (UCB1-Uniform/Power), and apply them to MCTS for classical planning. We formally prove their regret bounds and empirically demonstrate their performance in classical planning.
Stefano V. Albrecht, J. Christopher Beck, et al.
AAAI 2015
Daniel Fišer, Daniel Gnad, et al.
IJCAI 2021
Carlos Hernández Ulloa, Adi Botea, et al.
IJCAI 2017
Masataro Asai, Christian Muise
IJCAI 2020