Direct Pose Estimation and Refinement
We study a fundamental question in pose estimation from vision-only video data: should the pose of a camera be determined from fixed and known correspondences? Or should correspondences be simultaneously estimated alongside the pose? Determining pose from fixed correspondences is known as feature-based, where wellestablished tools from projective geometry are utilized to formulate and solve a plethora of pose estimation problems. Nonetheless, in degraded imaging conditions such as low light and blur, reliably detecting and precisely localizing interest points becomes challenging. Conversely, estimating correspondences alongside motion is known as the direct approach, where image data are used directly to determine geometric quantities without relying on sparse interest points as an intermediate representation. The approach is in general more precise by virtue of redundancy as many measurements are used to estimate a few degreesof- freedom. However, direct methods are more sensitive to changes in illumination. In this work, we combine the best of the feature-based approaches with the precision of direct methods. Namely, we make use of densely and sparsely evaluated local feature descriptors in a direct image alignment framework to address pose estimation in challenging conditions. Applications include tracking planar targets under sudden and drastic changes in illumination as well as visual odometry in poorly-lit subterranean mines. Motivated by the success of the proposed approach, we introduce a novel formulation for the joint refinement of pose and structure across multiple views akin to feature-based bundle adjustment (BA). In contrast to minimizing the reprojection error using BA, initial estimates are refined such that the photometric consistency of their image projections is maximized without the need for correspondences. The fundamentally different technique is evaluated on a range of datasets and is shown to improve upon the accuracy of the state-of-the-art in vision-based simultaneous localization and mapping (VSLAM).