Automatic Analysis of Facial Actions: Learning from Transductive, Supervised and Unsupervised Frameworks
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
Automatic analysis of facial actions (AFA) can reveal a person’s emotion, intention, and physical state, and make possible a wide range of applications. To enable reliable, valid, and efficient AFA, this thesis investigates automatic analysis of facial actions through transductive, supervised and unsupervised learning. Supervised learning for AFA is challenging, in part, because of individual differences among persons in face shape and appearance and variation in video acquisition and context. To improve generalizability across persons, we propose a transductive framework, Selective Transfer Machine (STM), which personalizes generic classifiers through joint sample reweighting and classifier learning. By personalizing classifiers, STM offers improved generalization to unknown persons. As an extension, we develop a variant of STM for use when partially labeled data are available. Additional challenges for supervised learning include learning an optimal representation for classification, variation in base rates of action units (AUs), correlation between AUs and temporal consistency. While these challenges could be partly accommodated with an SVM or STM, a more powerful alternative is afforded by an end-to-end supervised framework (i.e., deep learning). We propose a convolutional network with long short-term memory (LSTM) and multi-label sampling strategies. We compared SVM, STM and deep learning approaches with respect to AU occurrence and intensity in and between BP4D+  and GFT  databases, which consist of around 0.6 million annotated frames. Annotated video is not always possible or desirable. We introduce an unsupervised Branch-and-Bound framework to discover correlated facial actions in un-annotated video. We term this approach Common Event Discovery (CED). We evaluate CED in video and motion capture data. CED achieved moderate convergence with supervised approaches and enabled discovery of novel patterns occult to supervised approaches.