Carnegie Mellon University
Browse
- No file added yet -

Policy Search by Dynamic Programming

Download (159.86 kB)
journal contribution
posted on 2004-01-01, 00:00 authored by J. Andrew Bagnell, Sham Kakade, Andrew Y Ng, Jeff Schneider
We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.

History

Date

2004-01-01

Usage metrics

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC