Relating reinforcement learning performance to classification performance

John Langford; Bianca Zadrozny

doi:10.1145/1102351.1102411

ICML 2005

Conference paper

07 Aug 2005

Relating reinforcement learning performance to classification performance

View publication

Abstract

We prove a quantitative connection between the expected sum of rewards of a policy and binary classification performance on created subproblems. This connection holds without any unobservable assumptions (no assumption of independence, small mixing time, fully observable states, or even hidden states) and the resulting statement is independent of the number of states or actions. The statement is critically dependent on the size of the rewards and prediction performance of the created classifiers. We also provide some general guidelines for obtaining good classification performance on the created subproblems. In particular, we discuss possible methods for generating training examples for a classifier learning algorithm.

Workshop