posted on 2015-12-01, 00:00authored byKeshav Thirumalai Seshadri
The automatic localization of facial landmarks, also referred to as facial landmarking or facial alignment, is a key pre-processing step that is of vital importance to the carrying out of tasks such as facial recognition, the generation of 3D facial models, expression analysis, gender and ethnicity classification, age estimation, segmentation of facial features, accurate head pose estimation, and a variety of other facial analytic tasks. Progress in all these areas of research has heightened the need for developing accurate facial alignment algorithms that can generalize well to handle simultaneous variations in pose, illumination, expression, and high levels of facial occlusion in real-world images. This thesis proposes a facial alignment algorithm that is not only tolerant to the joint presence of facial occlusions, pose variation, and varying expressions, but also provides feedback (misalignment/ occlusion labels for the detected landmarks) that could be of use to subsequent stages in a facial analysis pipeline. Our approach proceeds from sparse to dense landmarking steps using a set of pose and expression specific models trained to best account for the variations in facial shape and texture manifested in real-world images. We also propose the use of a novel shape regularization approach that sets up this task as an `1-regularized least squares problem. This avoids the generation of implausible facial shapes and results in higher landmark localization accuracies than those obtained using prior shape models. Our approach is thoroughly evaluated on many challenging real-world datasets and demonstrates higher landmark localization accuracies and more graceful degradation than several state-of-the-art methods. We proceed to put the task of facial alignment into better context by examining its role in two applications that require alignment results as input: (1) a large-scale facial recognition scenario and (2) a project aimed at improving driver safety by assessing facial cues. Finally, we also carry out a rigorous set of experiments to analyze the performance of our approach when dealing with low-resolution images and provide some insights gained from this study.