Inverse Optimal Heuristic Control for Imitation Learning

Ratliff, Nathan; Ziebart, Brian; Peterson, Kevin; Bagnell, J. Andrew; Hebert, Martial; Dey, Anind K.; Srinivasa, Siddhartha

doi:10.1184/R1/6555227.v1

file.pdf (1.61 MB)

Inverse Optimal Heuristic Control for Imitation Learning

journal contribution

posted on 2009-01-01, 00:00 authored by Nathan Ratliff, Brian Ziebart, Kevin Peterson, J. Andrew Bagnell, Martial Hebert, Anind K. Dey, Siddhartha Srinivasa

mitation learning is an increasingly important tool for both developing automatic decision making systems as well as for learning to predict decision-making and behavior by observation. Two basic approaches are common: the first, which we here term \emph{behavioral cloning} (BC)\cite{BehavioralCloning,ALVINN,DAVE}, treats the imitation learning problem as a straightforward one of supervised learning (e.g. classification) where the goal is to map observations to controls. Secondly, the notion of\emph{inverse optimal control} (IOC) \cite{BoydIOC,ng00irl,Abbeel04c,mmp06} for modeling such decision making behavior has gained prominence as it allows for learned decision-making that reasons sequentially and over a long horizon. Unfortunately, such inverse optimal control methods rely upon the ability to efficiently solve a planning problem and suffer from the usual ``curse of dimensionality'' when the state space gets large. This paper presents a novel approach to imitation learning that we call \emph{Inverse Optimal Heuristic Control }(IOHC) which capitalizes on the strengths of both paradigms by allowing long-horizon, planning style reasoning in a low dimensional space, while enabling a high dimensional additional set of features to guide overall action selection. We frame this combined problem as one of optimization, and although the resulting objective function is actually non-convex, we are able to provide convex upper and lower bounds to optimize as surrogates. Further, these bounds, as well as our empirical results, show that the objective function is nearly convex and leads to improved performance on a set of imitation learning problems including turn prediction of drivers as well as predicting the likely paths taken by pedestrians in an office environment.

History

Date

2009-01-01

Usage metrics

Keywords

Robotics

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Inverse Optimal Heuristic Control for Imitation Learning

History

Date

Usage metrics

Categories

Keywords

Licence

Exports