Carnegie Mellon University
Browse

Towards Face Recognition with Imbalanced Training Data: From Loss Function Design to Deep Generative Models

Download (25.4 MB)
thesis
posted on 2023-03-22, 20:55 authored by Yutong ZhengYutong Zheng

Face recognition has emerged as one of the most prominent tasks for deep learning algorithms over the past decade. With a carefully designed pipeline, the face recognition system has improved dramatically in terms of accuracy and robustness. Unfortunately, the face recognition system still fails under challenging conditions in the real world. For example, a face recognition system can fail under extreme environmental conditions such as poor lighting, large pose variations, extreme facial expressions, or low resolution. We argue, however, that these challenging cases do not eliminate identity information because humans are capable of recognizing a person under a variety of challenging conditions. The poor performance of face recognition systems under such conditions can therefore be improved by design that aims at solving these cases. Often, challenging cases are a result of data imbalances in the training data, where face recognition systems see a very limited number of rare cases, thus not being able to recognize them. We demonstrate multiple directions for solving data imbalance issues in this dissertation. 

In the first part of this dissertation, we present an overview of our face recognition system and describe the efforts we made to deal with data imbalances. We demonstrate the major components of our face recognition system, namely our robust face detector for data collection, and facial feature extractor for face matching. The systems are designed in a way that improves the capability of capturing as much identity information from an imbalanced dataset as possible. Specifically, we introduce a multi-scale deep feature extraction module for robust face detection. During the process, deep facial features are extracted from multiple layers of the backbone neural network. This brings stable predictions regardless of a variety of challenging conditions, such as extreme resolution, blurry images, and occlusion. With our face detection system, we prevent data imbalances from entering the pre-processing pipeline. Next, we introduce and apply Ring loss to our facial feature extractor. Ring loss is a smooth feature normalization method to improve face recognition accuracy in rare cases. Observations suggest that faces with poor matching accuracy are not always difficult examples, but rather rare examples in training. Thus, we propose to perform a smooth L2 normalization of the deep facial features to a common magnitude. This will improve the robustness of deep features and make them more distinguishable based on their directions. Our experimental results validate the effectiveness of our proposed methods in tackling imbalanced datasets without incorporating any data augmentation. 

In the second part of the dissertation, we introduce a series of face synthesis algorithms as a data augmentation tool that aims at reducing existing data imbalances. A key focus of our research is to develop unsupervised deep generative models that maximize data augmentation while minimizing human intervention. We first examine the capability of traditional 2D Generative Adversarial Networks (GAN) to synthesize and manipulate realistic faces toward identity-preserving data augmentation. To be specific, we use linear manipulation of Style-GAN latent space to perform guided synthesis of 2D face images. We discovered the innate defects of traditional GANs during the development of this method, namely their inability to maintain consistency throughout 3D transformations, such as facial poses. As a result, we further propose and describe the design of an unsupervised symmetry-aware 3D face generator to perform smooth and realistic 3D pose manipulation of human faces while keeping the 3D geometry and generic identity information unchanged. The 3D generator is trained by combining an efficient 3D-aware GAN backbone with the prevailing Neural Radiance Fields (NeRF) module. We compared our newly developed method with other existing 3D GAN architectures and the result reveals a remarkable improvement in terms of pose manipulation accuracy and identity preservation. Meanwhile, we incorporate a simple but highly efficient background disentangle module to decouple the faces and the background during the synthetic process. Overall, our face synthesis methods yield promising results. We hope to inspire future researchers in the face recognition community to continue tackling the challenging data imbalance issue with generative models. 

History

Date

2022-12-15

Degree Type

  • Dissertation

Department

  • Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Marios Savvides

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC