Communication Beyond Words: Grounding Visual Body Motion with Language

Ahuja, Chaitanya

doi:10.1184/R1/21766412.v1

Communication Beyond Words: Grounding Visual Body Motion with Language

thesis

posted on 2023-01-06, 21:43 authored by Chaitanya AhujaChaitanya Ahuja

Communication is essential in sharing knowledge and ideas. It encourages collaboration and teamwork, an important step towards inducing positive change in human societies. It is also a key building block for forging new relationships through self-expression as well as understanding others’ emotions and thoughts. Communication is often categorized with verbal and nonverbal messages where nonverbal includes both vocal (e.g. prosody) and visual modalities (e.g. hand gestures, facial expressions). These three modalities have a fruitful and complex relationship with each other when communicating. Evolving technologies for online communication such as virtual reality have created a need for generating high-fidelity nonverbal communication along with verbal and vocal cues (e.g. communication in a virtual space). One key communicative cue is visual body motions which can express a wide range of messages across arms, hands, gait, physical skills (such as jumping, running, and so on) and interaction with the environment. Body motions also include gestures that accompany spoken language. These co-speech gestures allow speakers to articulate the intent and express emphasis.

The central theme of this thesis is to understand the two-way relationship (a.k.a. grounding) between human body motions and its associated spoken language, which includes both verbal and vocal cues. Understanding this complex relationship will help us to both better understand the meaning intended by body gestures and provide us with the knowledge necessary to generate more realistic nonverbal body animations with interactive technologies. With these motivations in mind, we propose three key challenges: (1) Nonverbal Grounding as the core component of this thesis to study the close relationship between spoken language and motion, (2) Personalization to better understand idiosyncrasies and commonalities on how people gesture, and (3) Low Resource Learning when gestures occur infrequently or the amount of labeled data is limited and often unbalanced. These challenges investigate the commonalities, uniqueness and generalizability of visual body communication respectively in the presence of verbal and vocal information.

This thesis makes some significant contributions for all three technical challenges starting with a contribution studying nonverbal grounding for a slightly easier problem: descriptive language, where the language is a direct description of the body motion. For the next contribution, we transition to the more challenging problem which is central to this thesis: co-speech gestures, where body motion naturally happens during spoken language (which is a combination of both vocal and verbal cues). Spoken language often has a long-tailed distribution which can prevent modeling for uncommon words and gestures. Hence, we study grounding of co-speech gestures in spoken language in presence of a long-tail distribution in the data. In the next contribution, we study the different styles of gesturing across many individuals in tandem with an adaptation of our ideas of grounding co-speech gestures for this context. Our next contribution revolves around the idea that humans can quickly understand gestures and body motion of a new person with only a few minutes of interaction with this person motivating the study of personalization of grounding and gesture style in a low resource setting. For our final contribution, we extend the ideas of low resource personalization to a more practical setting of continual learning. Here, the personalization still needs only a few minutes of training data, but in the process of learning new speakers, the model does not forget the old ones.

History

Date

2022-05-04

Degree Type

Dissertation

Department

Language Technologies Institute

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Louis-Philippe Morency

Usage metrics

Keywords

grounding co-speech gestures style Style transfer low resource learning generative modeling Artificial Intelligence and Image Processing not elsewhere classified

Licence

In Copyright

Communication Beyond Words: Grounding Visual Body Motion with Language

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports