posted on 2008-01-01, 00:00authored byBrenna D. Argall, Brett Browning, Manuela M. Veloso
As robots become more commonplace within society,
the need for tools to enable non-robotics-experts to develop
control algorithms, or policies, will increase. Learning from
Demonstration (LfD) offers one promising approach, where
the robot learns a policy from teacher task executions. Our
interests lie with robot motion control policies which map
world observations to continuous low-level actions. In this
work, we introduce Advice-Operator Policy Improvement (AOPI)
as a novel approach for improving policies within LfD.
Two distinguishing characteristics of the A-OPI algorithm are
data source and continuous state-action space. Within LfD, more
example data can improve a policy. In A-OPI, new data is
synthesized from a student execution and teacher advice. By
contrast, typical demonstration approaches provide the learner
with exclusively teacher executions. A-OPI is effective within
continuous state-action spaces because high level human advice
is translated into continuous-valued corrections on the student
execution. This work presents a first implementation of the AOPI
algorithm, validated on a Segway RMP robot performing
a spatial positioning task. A-OPI is found to improve task
performance, both in success and accuracy. Furthermore,
performance is shown to be similar or superior to the typical
exclusively teacher demonstrations approach.