Carnegie Mellon University
Browse

Effective Anchoring of Multimodal Narrative Generation

Download (13.71 MB)
thesis
posted on 2024-10-23, 20:05 authored by Khyathi ChanduKhyathi Chandu

Humans inherently learn from and interact with multiple views of information, be it various modalities or languages. So, the expectations from contemporary and 21st-century technology are a testimony for the increasing need to model these multiview contexts better. Natural language generation plays a pivotal role in communicating these contexts in human-understandable languages. This thesis brings together both of these transformative technologies to make strides towards a long standing dream of humanlike multiview narrative generation. The critical challeng is identifying the natural-sounding properties of long-form texts and modeling them in tandem with visual contexts. This thesis presents anchors for grounding three such properties including content (relevance), structure (coherence), and surface form realization (expression), and anchors them with relevent visual contexts. These anchors also provide us with human interpretable handles for controlling these properties.



History

Date

2021-12-01

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Alan Black

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC