Carnegie Mellon University
Browse

Building 4D Models of Objects and Scenes from Monocular Videos

Download (4.39 MB)
thesis
posted on 2023-09-07, 21:02 authored by Gengshan YangGengshan Yang

This thesis studies how to infer the time-varying 3D structures of generic, deformable objects, and dynamic scenes from monocular videos. A solution to this problem is essential for virtual reality and robotics applications. To reconstruct and track dynamic structures in 3D, prior work takes advantage of specialized sensors or parametric body models learned from registered 3D data. However, neither of them scales robustly to diverse sets of objects and events that one may see in the real world. 

Inferring 4D structures given 2D observations is challenging due to its under-constrained nature: Given images captured at different time instances, there are an infinite number of interpretations of their underlying geometry, color, and motion. In a casual setup where there is neither sufficient sensor measurement nor rich 3D supervision, one needs to tackle three challenges: (1) Registration: how to find correspondence and track camera frames? (2) Depth ambiguity: how to lift 2D observations to 3D? (3) Limited views: how to infer the structures that are not observable? 

We first study the 4D reconstruction problem in a single video setup and then extend it to multiple videos, different instances, and scenes. Inspired by analysis-by-synthesis, we set up an inverse graphics problem and solve it with generic data-driven priors. Inverse graphics models (e.g., differentiable rendering, differentiable physics simulation) approximate the true generation process of a video with differentiable operations, allowing one to inject prior knowledge about the physical world and compute gradients to update the model parameters. Generic data-driven priors (e.g., optical flow, pixel features, viewpoint) provide guidance to register pixels to a canonical 3D space, which allows us to fuse observations over time and across similar instances. Building upon these observations, we develop methods to capture 4D models of deformable objects and dynamic scenes from in-the-wild video footage. 

History

Date

2023-07-31

Degree Type

  • Dissertation

Department

  • Robotics Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Deva Ramanan

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC