file.pdf (1.35 MB)

Online Bellman Residual Algorithms with Predictive Error Guarantees

Download (1.35 MB)
journal contribution
posted on 01.07.2015, 00:00 by Wen Sun, J. Andrew Bagnell

We establish a connection between optimizing the Bellman Residual and worst case long-term predictive error. In the online learning framework, learning takes place over a sequence of trials with the goal of predicting a future discounted sum of rewards. Our analysis shows that, to- gether with a stability assumption, any no-regret online learning algorithm that minimizes Bellman error ensures small prediction error. No statistical assumptions are made on the sequence of observations, which could be non-Markovian or even adversarial. Moreover, the analysis is independent of the particular form of function approximation and the particular (stable) no-regret approach taken. Our approach thus establishes a broad new family of provably sound algorithms for Bellman Residual-based learning and provides a generalization of previous worst-case result for minimizing predictive error. We investigate the potential advantages of some of this family both theoretically and empirically on benchmark problems

History

Date

01/07/2015

Exports

Exports