Carnegie Mellon University
Browse

Joint Reasoning for Camera and 3D Human Pose Estimation

Download (136.15 MB)
thesis
posted on 2023-03-27, 20:41 authored by Yan XuYan Xu

Estimating the 6-DoF camera pose and the 3D human pose lies at the core of many computer vision tasks, such as virtual reality, augmented reality, and human-robot interaction. Existing efforts either rely on large amounts of 3D training data for each new scene or require strong prior knowledge, e.g., known camera poses, only available in laboratory environments. Despite the improvements in the numbers on a few public datasets, the gap between laboratory research and real-world applications remains. The objective of this thesis is to develop camera and human pose estimation methods that can bridge this gap.

This thesis includes two parts. The first part focuses on camera pose estimation using human information. We first introduce a single-view camera pose estimation method that uses a lightweight network trained only on synthetic 2D human trajectory data to directly regress the camera pose at test using real human trajectories. After that, we present a wide-baseline multi-view camera pose estimation method that treats humans as key points and uses a re-ID network pre-trained on public datasets to embed human features for solving cross-view matching. We show that both methods do not require 3D data collection and annotation and generalize to new scenarios without extra effort. 

The second part of this thesis concentrates on multi-view multi-person 3D human pose estimation targeting the challenging setting where the camera poses are unknown. We present a method that follows the detection-matching-reconstruction process and treats the cross-view matching as a clustering problem with the number of humans and cameras as constraints. Compared with existing methods, ours is one of the first that does not require camera poses, 3D data collection, or model training for each specific dataset. Next, we further improve the method by introducing a multi-step clustering mechanism and leveraging short-term single-view tracking to boost cross-view matching performance. Our method shows excellent generalization ability across various in-the-wild settings. 

History

Date

2022-12-05

Degree Type

  • Dissertation

Department

  • Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Kris Kitani

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC