Carnegie Mellon University
achald_phd_robotics.pdf (28.89 MB)

Open-world Object Detection and Tracking

Download (28.89 MB)
posted on 2021-09-23, 19:56 authored by Achal DaveAchal Dave
Computer vision today excels at recognizing narrow slices of the real world: our models seem to accurately detect objects like cats, cars, or chairs in benchmark datasets. However, deploying models requires that they work in the open world, which includes arbitrary objects in diverse settings.
Current methods struggle on both axes: they recognize only a few classes, and struggle in settings that differ from the training distribution. A model that addresses these challenges can serve as a fundamental building block
for downstream applications, including recognizing actions, manipulating objects, and navigating around obstacles. This thesis presents our work in building robust models for detecting and tracking any object, especially ones with few or even no training examples. We start by exploring how traditional models, which recognize only a small set of object classes, generalize to the real world. We show that current
methods are extremely sensitive: even subtle changes in the input image or test distribution can lead to drops in accuracy. Our systematic evaluations show that models — even ones trained for robustness to adversarial or synthetic corruptions — often correctly classify one frame of a video, but fail on a perceptually similar nearby frame. A similar phenomenon applies even to small distribution shifts arising from natural variation between datasets. Finally, we present an approach for addressing an extreme form of generalization to object appearance: detecting fully occluded objects. Next, we explore generalization to large or infinite vocabularies, which
contain rare and never-before-seen classes. Since current datasets are largely limited to a small, closed-world set of objects, we first present a large vocabulary benchmark for measuring progress in detection and tracking. We show that current evaluations do not suffice for large vocabulary benchmarks, and present alternative metrics that appropriately evaluate progress in this setting. Finally, we present approaches which leverage advances in closed-world recognition to build accurate, generic detectors and trackers for any object.




Degree Type

  • Dissertation


  • Robotics Institute

Degree Name

  • Doctor of Philosophy (PhD)


Deva Ramanan

Usage metrics




    Ref. manager