Custom-Design of FDR Encodings: The Case of Red-Black Planning
Daniel Fišer, Daniel Gnad, et al.
IJCAI 2021
Width-based planning has shown promising results on Atari 2600 games using pixel input, while using substantially fewer environment interactions than reinforcement learning. Recent width-based approaches have computed feature vectors for each screen using a hand designed feature set (Rollout-IW) or a variational autoencoder trained on game screens (VAE-IW), and prune screens that do not have novel features during the search. We propose Olive (Online-VAE-IW), which updates the VAE features online using active learning to maximize the utility of screens observed during planning. Experimental results across 55 Atari games demonstrate that it outperforms Rollout-IW by 42-to-11 and VAE-IW by 32-to-20. Moreover, Olive outperforms existing work based on policy-learning (π-IW, DQN) trained with 100 times the training budget by 30-to-22 and 31-to-17, and a state of the art data-efficient reinforcement learning (EfficientZero) trained with the same training budget and ran with 1.8 times the planning budget by 18-to-7 in the Atari 100k benchmark, without any policy learning. The source code and the appendix are available at github.com/ibm/atari-active-learning and arxiv.org/abs/2109.15310 .
Daniel Fišer, Daniel Gnad, et al.
IJCAI 2021
Stefano V. Albrecht, J. Christopher Beck, et al.
AAAI 2015
Carlos Hernández Ulloa, Adi Botea, et al.
IJCAI 2017
Masataro Asai, Christian Muise
IJCAI 2020