Carnegie Mellon University
Browse

Exploiting Compositionality in Sequence Models

Download (6.7 MB)
thesis
posted on 2023-01-06, 21:02 authored by Siddharth DalmiaSiddharth Dalmia

Compositionality is the principle behind building complex systems by composing together simpler sub-systems. First, a complex task is broken down into a pipeline of simpler, more straightforward tasks. Once systems are built for the simpler tasks, intricate details are abstracted away and components are pipelined together to reason about the overall task. The knowledge and resources spent building the systems for the sub-task are then reused towards building various complex systems. This divide-and-conquer approach to compositionality promotes the flexibility, efficiency, and overall practicality of these systems. 

While traditional sequence systems such as cascade models leveraged compositionality, contemporary end-to-end models such as encoder-decoder models fail to satisfy even the basic compositionality requirements, i.e., having a clear understanding of the function of each component. The lack thereof hinders the practical use of end-to-end systems, despite these approaches having advanced the state-of-the-art in a wide range of sequence tasks. 

In this thesis, with a focus on sequence tasks for speech and language, we identify four characteristics of compositionality that facilitate the practical deployment of end-to-end models, i.e., having component or sub-task level (1) Performance Monitoring, (2) Search and Retrieval, (3) Resource Pooling, and (4) Reusability. We present three models with the above characteristics, which exhibit different levels of reusability behavior, from direct plug-and-play ability to the ability for further finetuning towards the end task. The first is the CTC Hybrid model, which creates a hybrid of models by decomposing a sequence task into alignment using a CTC model and language generation using a language model. For a fully differentiable alternative, we present LegoNN modular encoder-decoder models, which build reusable encoder and decoder modules across various sequence tasks, with the ability for further fine-tuning. Lastly, we present our Compositional E2E model with searchable hidden intermediates that allows using the sub-task formulations to build an end-toend task model. It also allows reusing pre-trained sub-task models for retrieving better intermediate representations in the fully-differentiable model. 

Finally, we discuss practical implications on the evaluation of end-to-end models, where we show how to make their evaluation more reliable and informative by testing their generalizability towards each sub-task in a complex sequence task. 

History

Date

2022-08-24

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Florian Metze, Alan W Black