Carnegie Mellon University
Browse
Sudharshan_thesis.pdf (39.21 MB)

Perception amidst interaction: spatial AI with vision and touch for robot manipulation

Download (39.21 MB)
thesis
posted on 2024-03-07, 17:06 authored by Sudharshan SureshSudharshan Suresh

 Robots currently lack the cognition to replicate even a fraction of the tasks humans do, a trend summarized by Moravec’s Paradox. Humans effortlessly combine their senses for everyday interactions—we can rummage through our pockets in search of our keys, and deftly insert them to unlock our front door. Before robots can demonstrate such dexterity, they must first exhibit spatial awareness of the objects they manipulate. Specifically, object pose and shape are important quantities for downstream planning and control. The status quo for in-hand perception is restricted to the narrow scope of tracking known objects with vision as the dominant modality. As robots move out of instrumented labs and factories to cohabit our spaces, it is clear that a missing piece is generalizable spatial AI. 

Often overlooked is tactile sensing, which provides a direct window into robot-object interaction, free from occlusion and aliasing. With hardware advances like vision-based touch, we now have situated yet detailed information to complement cameras. However, interactive perception is intrusive—the act of sensing itself perturbs the object. Can we robustly estimate object shape and pose online from a stream of multimodal robot manipulation data? 

In this thesis, I study the intersection of simultaneous localization and mapping (SLAM) and robot manipulation. More specifically, I look at: (1) spatial representations for object-centric SLAM, (2) tactile perception and simulation, and (3) combining learned models with online optimization. First, I show how factor graphs fuse touch with physics-based constraints for SLAM in planar manipulation (Chapter 2). Next, I present a schema for online shape learning from visuo-tactile sensing (Chapter 3). I then demonstrate a learned tactile representation for global localization via touch (Chapter 4). Drawing upon the above efforts, I culminate with unifying vision, touch and proprioception into a neural representation for SLAM during in-hand manipulation (Chapter 5) 

History

Date

2024-02-28

Degree Type

  • Dissertation

Department

  • Robotics Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Michael Kaess

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC