Effective Anchoring of Multimodal Narrative Generation
Humans inherently learn from and interact with multiple views of information, be it various modalities or languages. So, the expectations from contemporary and 21st-century technology are a testimony for the increasing need to model these multiview contexts better. Natural language generation plays a pivotal role in communicating these contexts in human-understandable languages. This thesis brings together both of these transformative technologies to make strides towards a long standing dream of humanlike multiview narrative generation. The critical challeng is identifying the natural-sounding properties of long-form texts and modeling them in tandem with visual contexts. This thesis presents anchors for grounding three such properties including content (relevance), structure (coherence), and surface form realization (expression), and anchors them with relevent visual contexts. These anchors also provide us with human interpretable handles for controlling these properties.
History
Date
2021-12-01Degree Type
- Dissertation
Department
- Language Technologies Institute
Degree Name
- Doctor of Philosophy (PhD)