Multi-Human 3D Reconstruction from In-the-Wild Videos

Khirodkar, Rawal Sheshrao

doi:10.1184/R1/24438445.v1

Multi-Human 3D Reconstruction from In-the-Wild Videos

thesis

posted on 2023-11-16, 18:33 authored by Rawal Sheshrao KhirodkarRawal Sheshrao Khirodkar

We study the problem of multi-human 3D reconstruction from videos captured in the wild. Human movements are dynamic, and accurately reconstructing them in various settings is crucial for developing immersive social telepresence, assistive humanoid robots, and augmented reality systems. However, creating such a system requires addressing fundamental issues with previous works regarding the data and model architectures. In this thesis, we develop several large-scale 3D benchmarks designed to evaluate multi-human reconstruction under demanding conditions and top-down algorithms robust to occlusion and crowded environments.

Data - Obtaining 3D supervision at scale for deep learning models is crucial for achieving real-world generalization. However, unlike the large-scale 2D datasets, the diversity of the 3D datasets is significantly limited - primarily because manually annotating in the 3D space is impractical. Consequently, most 3D benchmarks are limited to indoor environments or, at most, two human subjects outdoors, with stationary or slow camera movements and minimal occlusion. To address this gap, we explore using 3D synthetic data and construct two real multi-human 3D datasets that incorporate dynamic human activities, rapid camera movements, and human-human contact, largely neglected in previous benchmarks; to highlight the critical limitations of the existing methods.

Methodology - A general multi-human 3D reconstruction method should be robust to scale variations and occlusions and incorporate absolute depth understanding. We introduce algorithms with these traits in 2D and 3D settings, which enable reasoning about multiple humans in dynamic environments and crowded scenes. Our top-down approach exploits spatial-contextual information to reason about severely occluded humans in the 3D scene.

Building upon these two components, we develop general 3D methods that reconstruct multiple humans in dynamic scenes from in-the-wild videos.

History

Date

2023-09-29

Degree Type

Dissertation

Department

Robotics Institute

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Kris Kitani

Usage metrics

Keywords

Computer Vision Machine Learning Human Pose Estimation Multi-Object Tracking Multi-View Geometry Crowding Occlusion Human Mesh Recovery Artificial Intelligence and Image Processing

Licence

CC BY 4.0

Multi-Human 3D Reconstruction from In-the-Wild Videos

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports