posted on 2003-01-01, 00:00authored byJ. Andrew Bagnell, Jeff Schneider
We investigate the problem of non-covariant behavior
of policy gradient reinforcement learning algorithms.
The policy gradient approach is amenable
to analysis by information geometric methods. This
leads us to propose a natural metric on controller
parameterization that results from considering the
manifold of probability distributions over paths induced
by a stochastic controller. Investigation
of this approach leads to a covariant gradient ascent
rule. Interesting properties of this rule are
discussed, including its relation with actor-critic
style reinforcement learning algorithms. The algorithms
discussed here are computationally quite
efficient and on some interesting problems lead
to dramatic performance improvement over noncovariant
rules.