Carnegie Mellon University
Browse
junxian1_PhD_LTI_2022.pdf (4.78 MB)

Towards Efficient Natural Language Generation

Download (4.78 MB)
thesis
posted on 2023-12-21, 20:45 authored by Junxian HeJunxian He

 Natural language generation (NLG) has seen remarkable success benefiting from the development of deep learning techniques. As large-scale pretraining becomes the de-facto standard in NLP, enormous training data and model parameters consistently lead to state-of-the-art performance on standard NLG tasks. While quite successful, current NLG approaches are inefficient from several aspects, which prohibits their usage in broader and practical settings: (1) they are label-inefficient – conditional neural generation (e.g. machine translation) often requires a large number of annotated samples to train, which limits their applications in low-resource regimes; (2) they are parameter-inefficient – it is common practice to fine-tune a pretrained model to adapt it to the downstream task, however, these models could scale to trillions of parameters (Fedus et al., 2021), which would cause a large memory footprint when serving a large number of tasks; and (3) lastly, we focus on the compute-inefficiency of a trending model class, retrieval-augmented NLG models. They retrieve from an external datastore to assist in generation, the added datastore and retrieval process incurs non-trivial space and time cost due to extra computation. 

In this thesis, we aim to provide a deeper understanding of research problems in efficient NLG and utilizing the insights to design better approaches. Specifically, (1) for label-efficiency we study unsupervised and semi-supervised conditional genera?tion that take advantage of the abundant unlabeled text data, and thus mitigate the requirement of numerous annotated samples. The proposed methods are validated on a wide variety of NLG tasks; (2) for parameter-efficiency we propose a unified framework to connect parameter-efficient transfer learning, where only few parameters need to be updated to adapt a large pretrained model to downstream tasks. Our framework provides a new understanding of this direction, as well as instantiating state-of-the-art approaches for parameter-efficient NLG; (3) for compute-efficiency in retrieval-augmented NLG we either design new models or post-adapt the retrieval component to compress the datastore, reduce the retrieval compute, and speed up the inference.  

History

Date

2022-08-18

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Graham Neubig Taylor Berg-Kirkpatrick

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC