Carnegie Mellon University
tweng_phd_ri_2023.pdf (24.55 MB)

Redefining the Perception-Action Interface: Visual Action Representations for Contact-Centric Manipulation

Download (24.55 MB)
posted on 2023-11-16, 19:02 authored by Thomas WengThomas Weng

In robotics, understanding the link between perception and action is pivotal. Typically, perception systems process sensory data into state representations such as segmentations and bounding boxes, which a planner uses to plan actions. However, a state estimation approach can fail in environments with partial observability, and in cases with challenging object properties like transparency and deformability. Alternatively, visuomotor policies directly convert raw sensor input into actions, but they produce actions that are not grounded in contact, and perform poorly in unseen task configurations. 

To address these shortcomings, we delve into visual action representations, a class of approaches in which the perception system conveys information about potential actions in the environment. Visual action representations do not require full state estimation, and they ground interactions in an object- and contact-centric manner, reasoning about where to make contact with an object, how to approach contact locations, and how to manipulate the object once contact is made. Reformulating the role of perception to include action reasoning simplifies downstream planning. 

This thesis presents visual action representations for addressing visual and geometric challenges in manipulation. For grasping rigid objects, we devise a transfer learning method for transparent and specular objects (RA-L+ICRA ’20) and introduce Neural Grasp Distance Fields for 6-DOF grasping and motion planning (ICRA ’23). We introduce algorithms for cloth manipulation, starting with determining precise grasping points for edges and corners of cloth (IROS ’20). We incorporate tactile sensing into a contact-rich policy for precisely manipulating layers of cloth (IROS ’22). In subsequent work, we propose FabricFlowNet, a policy that predicts both where to grasp and how to fold for bimanual cloth folding (CoRL ’21). 




Degree Type

  • Dissertation


  • Robotics Institute

Degree Name

  • Doctor of Philosophy (PhD)


David Held