Addressing Ambiguity In Object Instance Detection
In this thesis, we study the topic of ambiguity when detecting object instances in scenes with severe clutter and occlusions. Our work focuses on the three key areas: (1) objects that have ambiguous features, (2) objects where discriminative point-based features cannot be reliably extracted, and (3) occlusions.
Current approaches for object instance detection rely heavily on matching discriminative point-based features such as SIFT. While one-to-one correspondences between an image and an object can often be generated, these correspondences cannot be obtained when objects have ambiguous features due to similar and repeated patterns. We present the Discriminative Hierarchical Matching (DHM) method which preserves feature ambiguity at the matching stage until hypothesis testing by vector quantization. We demonstrate that combining our quantization framework with Simulated Affine featurescan significantly improve the performance of 3D point-based recognition systems
While discriminative point-based features work well for many objects, they cannot be stably extracted on smooth objects which have large uniform regions. To represent these feature-poor objects, we first present Gradient Networks, a framework for robust shape matching without extracting edges. Our approach incorporates connectivity directly on low-level gradients and significantly outperforms approaches which use only local information or coarse gradient statistics. Next, we present the Boundary and Region Template (BaRT) framework which incorporates an explicit boundary representation with the interior appearance of the object. We show that the lack of texture in the object interior is actually informative and that an explicit representation of the boundary performs better than a coarse representation.
While many approaches work well when objects are entirely visible, their performance decrease rapidly with occlusions. We introduce two methods for increasing the robustness of object detection in these challenging scenarios. First, we present a framework for capturing the occlusion structure under arbitrary object viewpoint by modeling the Occlusion Conditional Likelihood that a point on the object is visible given the visibility of all other points. Second, we propose a method to predict the occluding region and score a probabilistic matching pattern by searching for a set of valid occluders. We demonstrate significant increase in detection performance under severe occlusions.