Learning Universal Humanoid Control
Since infancy, humans acquire motor skills, behavioral priors, and objectives by learning from their caregivers. Similarly, as we create humanoids in our own image, we aspire for them to learn from us and develop universal physical and cognitive capabilities that are comparable to, or even surpass, our own. In this thesis, we explore how to equip humanoids with the mobility, dexterity, and environmental awareness necessary to perform meaningful tasks. Unlike previous efforts that focus on learning a narrow set of tasks, such as traversing terrains, imitating a few human motion clips, or playing a single game, we emphasize scaling humanoid control tasks by leveraging large-scale human data (e.g., motion, videos). We show that scaling brings numerous benefits, gradually moving us closer to achieving truly “universal” capabilities.
Our key idea centers around scaling up humanoid motion tracking and forming a foundational humanoid control prior that can be used to speed up task learning. Just like young animals are born with the instinct to walk, run, and grasp, we wish to equip humanoids with motor control priors that lead to human-like movement. We begin by scaling the reinforce ment learning based motion tracking framework, enabling humanoids to imitate large-scale kinematic human motion datasets. This motion imitator forms the basis for acquiring motor skills: given a kinematic reference motion, the imitator can robustly control the humanoid to execute everyday activities and more complex, dynamic movements. Such a motion imitator can be used for human pose estimation, teleoperation, and controlling simulated avatars using first-person and third-person cameras.
Equipped with such a motion tracker, we distill behaviors from the tracker into a compact, physics-based control latent space and form a general-purpose humanoid control prior. This prior enables the reuse of previously learned motor skills from a large-scale dataset. Randomly sampling from this latent space leads to human-like behaviors from the humanoid. Leveraging this latent representation in hierarchical reinforcement learning significantly improves sample efficiency and produces human-like motion. A critical aspect of this framework is ensuring that the latent space faithfully encapsulates the full range of motor skills present in the source dataset—a property we verify empirically.
Building upon such a humanoid control prior, we study simulated humanoids equipped with dexterous hands, touch sensors, and vision to interact with their environments and manipulate objects. We find that our control prior significantly simplifies the training process for manipulation tasks, and we can learn policies that generalize across diverse sets of objects and scenes. Along the way, we solve practical problems involved in humanoid dexterous manipulation and perception-in-the-loop control.
Finally, we take a step toward real-world deployment by transferring this framework to physical humanoids. As a first milestone, we train a universal humanoid motion tracker that runs in real time and can be used for humanoid teleoperation. This real-world deployment high lights the practicality of our approach and sets the stage for future work in learning universal controllers for real humanoids.
History
Date
2025-04-25Degree Type
- Dissertation
Department
- Robotics Institute
Degree Name
- Doctor of Philosophy (PhD)