file.pdf (293.94 kB)
Maximum Entropy Inverse Reinforcement Learning
journal contribution
posted on 2008-01-01, 00:00 authored by Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, Anind K. DeyRecent research has shown the benefit of framing problems of imitation learning as solutions to Markov Decision Problems. This approach reduces learning to the problem of re-
covering a utility function that makes the behavior induced
by a near-optimal policy closely mimic demonstrated behavior. In this work, we develop a probabilistic approach based
on the principle of maximum entropy. Our approach provides
a well-defined, globally normalized distribution over decision
sequences, while providing the same performance guarantees
as existing methods.
We develop our technique in the context of modeling real world navigation and driving behaviors where collected data
is inherently noisy and imperfect. Our probabilistic approach
enables modeling of route preferences as well as a powerful new approach to inferring destinations and routes based on
partial trajectories.