Carnegie Mellon University
mohits1_phd_scs_2024 (1).pdf (42.14 MB)

Representation Reuse for Learning Robust RobotManipulations

Download (42.14 MB)
posted on 2024-02-21, 21:43 authored by Mohit SharmaMohit Sharma

Real world robots need to continuously learn new manipulation tasks.  These new manipulation tasksoften share many sub-structures with previously learned tasks, e.g., sub-tasks, controllers, preconditions.In this thesis, we aim to utilize these shared sub-structures to efficiently learn new manipulation tasks. For this, we explore reusing skill representations. These skill representations are either provided manually asstructured policy representations or learned in a data-driven manner.

The first part of this thesis focuses on policy representations.   To learn compositional skill policies we propose object-centric task-axes controllers.  Our task-axes controllers learn the skill structure and are composed into specialized policy representations for individual tasks.  These representations utilize the compositional, object-centric and geometric structure underlying many manipulation tasks. As we show through extensive experiments, these representations are robust to environment variations and are learned from limited data. We also show how parameterized policy representations help learn new tasks efficiently in a lifelong learning manner.  To achieve this, we propose skill effect models, which predict the effects of stereotypical skill executions.  We utilize skill effect models together with the power of search-based planning to effectively plan for new tasks and learn new skills over time

The second part of this thesis focuses on visual representations. These visual representations, learned either from simulation or offline web data are used for efficient learning of skill preconditions and policies  respectively. Specifically, for skill preconditions we focus on compositional learning and show how complex manipulation tasks, with multipleobjects, can be simplified by focusing on pairwise object relations. These relational representations are learned offline using large scale simulation data. In the latter part, we  focus on skill policies that utilize large pretrained visual representations for robot manipulation. First,  we propose RoboAdapters, which uses neural adapters as an alternative to frozen or fully-finetuned visual representations for robot manipulation. RoboAdapters bridge the performance gap between frozen  representations and full fine-tuning while preserving the original capabilities of the pretrained model.  Finally, we explore using large pretrained vision-language representations for real-time control of precise  and dynamic manipulation tasks. We use multiple sensing modalities at different hierarchies to enable  real-time control while maintaining the generalization and robustness of pretrained representations. 




Degree Type

  • Dissertation


  • Robotics Institute

Degree Name

  • Doctor of Philosophy (PhD)


Oliver Kroemer

Usage metrics



    Ref. manager