Carnegie Mellon University
Browse

Watch, Predict, Act: Robot Learning meets Web Videos

Download (70.28 MB)
thesis
posted on 2025-07-09, 20:51 authored by Homanga BharadhwajHomanga Bharadhwaj
<p dir="ltr">To enable robots to assist in everyday tasks in diverse natural environ ments such as homes, offices, and kitchens, it is critical to develop policies that generalize to novel tasks in unseen scenarios. Practical utility demands that these policies do not require task-specific adaptation at test time but can instead execute directly given a natural task specification, such as a language instruction. Moreover, such policies should be able to handle a broad spectrum of tasks—such as manipulating articulated objects, pouring, reorienting objects, and wiping tables — without the need for explicit robot data collection for every possible task, as required by the predominant paradigm of end-to end imitation learning. The difficulty in collecting large-scale, diverse robot interaction datasets in natural scenarios makes this requirement impractical. </p><p dir="ltr">While typical approaches rely on a large amount of demonstration data for such generalization, in this thesis we present approaches for effectively leveraging web data to scalably augment robot interaction datasets. This thesis pioneers the paradigm of conditioning robotic policies explicitly on motion cues from predictive models trained on large-scale video datasets, enabling the policy to perform new tasks with novel objects and novel motions unseen in the robot-specific data. We formalize the notion of factorizing a robotic policy into an embodiment-agnostic interaction plan that can now use general internet data and embodiment-specific action execution conditioned on the plan, which is substantially easier of a problem. Throughout the thesis we develop common goal/language-conditioned policies that can perform multiple tasks without relying on task-specific or scene-specific heuristics.</p>

History

Date

2025-05-01

Degree Type

  • Dissertation

Thesis Department

  • Robotics Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Abhinav Gupta Shubham Tulsiani

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC