Resource-Constrained Learning and Inference for Visual Perception

Li, Mengtian

doi:10.1184/R1/19942133.v1

mengtial_phd_robotics_2022.pdf (12.99 MB)

Resource-Constrained Learning and Inference for Visual Perception

thesis

posted on 2022-06-06, 21:12 authored by Mengtian LiMengtian Li

We have witnessed rapid advancement across major computer vision benchmarks over the past years. However, the top solutions’ hidden computation cost prevents them from being practically deployable. For example, training large models until convergence may be prohibitively expensive in practice, and autonomous driving or augmented reality may require a reaction time that rivals that of humans, typically 200 milliseconds for visual stimuli. Clearly, vision algorithms need to be adjusted or redesigned when meeting resource constraints. This thesis argues that we should embrace resource constraints into the first principles of algorithm designs. We support this thesis with principled evaluation frameworks and novel constraintaware solutions for both the cases of training and inference of computer vision tasks.

For evaluation frameworks, we first introduce a formal setting for studying training under the non-asymptotic, resource-constrained regime, i.e., budgeted training. Next,we propose streaming accuracy to evaluate latency and accuracy coherently with a single metric for real-time online perception. More broadly, building upon this metric, we introduce a meta-benchmark that systematically converts any single-frame task into a streaming perception task.

For constraint-aware solutions, we propose a budget-aware learning rate schedule for budgeted training, and dynamic scheduling and asynchronous forecasting for streaming perception. We also propose task-specific solutions, including foveated image magnification and progressive knowledge distillation for 2D object detection, multi-range pyramids for 3D object detection, and future object detection with backcasting for end-to-end detection, tracking and forecasting.

We conclude the thesis with discussions on future work. We plan to extend streaming perception to include long-term forecasting, generalize our foveated image magnification to arbitrary spatial image understanding tasks, and explore multi-sensor fusion for long-range 3D detection.

Funding

CMU Argo AI Center for Autonomous Vehicle Research

History

Date

2022-05-12

Degree Type

Dissertation

Department

Robotics Institute

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Deva Ramanan

Usage metrics

Keywords

Budgeted training,streaming perception object detection motion prediction

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Resource-Constrained Learning and Inference for Visual Perception

Funding

CMU Argo AI Center for Autonomous Vehicle Research

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports