Mean field equilibria of multiarmed bandit games
Abstract
Much of the classical work on algorithms for multiarmed bandits focuses on rewards that are stationary over time. By contrast, we study multiarmed bandit (MAB) games, where the rewards obtained by an agent also depend on how many other agents choose the same arm (as might be the case in many competitive or cooperative scenarios). Such systems are naturally nonstationary due to the interdependent evolution of agents, and in general MAB games can be intractable to analyze using typical equilibrium concepts (such as perfect Bayesian equilibrium). We introduce a general model of multiarmed bandit games, and study a notion of equilibrium inspired by a large system approximation known as mean field equilibrium. In such an equilibrium, the proportion of agents playing the various arms, called the population profile, is assumed stationary over time; the equilibrium requires a consistency check that this stationary profile arises from the policies chosen by the agents. We establish three main results in the paper. First, we establish existence of an MFE under general conditions. Second, we show under a contraction condition that the MFE is unique, and that the population profile converges to it from any initial state. Finally, we show that under the contraction condition, MFE is a good approximation to the behavior of finite systems with many agents. The contraction condition requires that the agent population is sufficiently mixing and that the sensitivity of the reward function to the population profile is low enough. Through numerical experiments, we find our main insights appear to hold even when the condition is violated. © 2012 Authors.