Carnegie Mellon University
Browse

Amodal Visual Scene Representations With and Without Geometry

Download (17.94 MB)
thesis
posted on 2022-06-06, 21:04 authored by Adam HarleyAdam Harley

Most computer vision models in deployment today describe the pixels of images. This does not suffice, because images are only projections of the scene in front of the camera. In this thesis we build representations that attempt to describe the scene itself. We call these representations “amodal” (i.e., without modality), emphasizing the fact that they describe elements of the scene for which we have no sensory input. We present two methods for amodal visual scene representation. The first focuses on modelling space, and proposes geometry-based methods for lifting images into 3D maps, where the objects are complete, despite partial occlusions in the imagery. We show that this representation allows for self-supervised learning from multi-view data, and yields state-of-the- art results as a perception system for autonomous vehicles, where the goal is to estimate a “bird’s eye view” semantic map from multiple sensors. The second method focuses on modelling time, and proposes geometry-free methods for tracking image elements through partial and full occlusions across a video. Using learned temporal priors and within inference optimization, we show that our model can track points across outperform flow-based and feature-matching methods on fine-grained multi-frame correspondence tasks.

History

Date

2022-05-12

Degree Type

  • Dissertation

Department

  • Robotics Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Katerina Fragkiadaki

Usage metrics

    Categories

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC