Carnegie Mellon University
twoertwe_phd_lti_2023.pdf (1.13 MB)

Towards Improving Transparency in Multimodal Affect Perception

Download (1.13 MB)
posted on 2024-02-21, 15:11 authored by Torsten WoertweinTorsten Woertwein

 Affective computing has emerged as a core research topic in artificial intelligence (AI) with broad applications in healthcare. For example, affective technologies can be used as a decision-support tool that quantifies behaviors related to emotions and affective states, which helps clinicians in their assessment of mood disorders such as depression. For these AI applications to be used and trusted, we need to focus on improving their transparency. Transparency is the degree to which users have information about a model’s internal mechanics and the reliability of its output. We expand this definition to include information about the data used to train a model, as data influences what a model learns. These three components of transparency (data, reliability, and internal mechanics) are studied in this thesis in two main research thrusts geared towards improving transparency for machine learning practitioners. 

In our first thrust on general transparency, we explore three challenges related to data and reliability transparency, where two focus on data transparency and one on reliability transparency. The first challenge, referred to as population-level data transparency, is analyzing data patterns across people to understand which patterns a model will likely learn. For this, we used statistical approaches to analyze patterns between how people speak and the symptom severity of psychosis. The second challenge, referred to as reliability transparency, estimates how accurate a model’s output is to enable better risk management, as the output might not always be correct. We created approaches to efficiently estimate the reliability of a primary model using a secondary model that learns when the primary model makes mistakes. The third challenge, referred to as personalized data transparency, is separating person-specific patterns from patterns shared across people and analyzing them. We efficiently integrated neural networks with mixed effect models, a statistical modeling approach that can separate these two types of patterns. 

In our second research thrust, we focus on the third component of transparency, internal mechanics. More specifically, we focus on the mechanics of multimodal models as affect is expressed through multiple modalities, such as visually smiling and audibly laughing. The first challenge, referred to as modality importance transparency, quantifies how much a model focuses on modalities to derive its output, which is a proxy for how important each modality is. We created a model that not only quantifies the modality importance but also reflects how informative humans perceive each modality. The second challenge, referred to as multimodal interaction transparency, quantifies interactions between three modalities, including both bimodal and trimodal interactions. Our approach separated unimodal, bimodal, and trimodal interactions by prioritizing simpler interactions over more complicated interactions, e.g., unimodal prioritized over bimodal. The third challenge, referred to as modality contribution transparency, factorizes a modality’s unique contributions, what can be explained by only one modality, from what can redundantly be contributed by multiple modalities. Our approach used correlational measures to define these contributions and the learned factorization correlated with human judgments.  




Degree Type

  • Dissertation


  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)


Louis-Philippe Morency