Carnegie Mellon University
Browse
amadaan_phd_lti_2024.pdf (14.02 MB)

Enhancing Language Models with Structured Reasoning

Download (14.02 MB)
thesis
posted on 2024-06-28, 15:40 authored by Aman Madaan

 The rapid growth in the areas of language generation and reasoning has been significantly facilitated by the availability of user-friendly libraries wrapped around large language models. These solutions often rely on the Seq2Seq paradigm, treating all problems as text-to-text transformations. While convenient, this approach faces limitations in practical deployments: brittleness when handling complex problems, the absence of feedback mechanisms, and an inherent black-box nature hindering model interpretability. 

This thesis presents techniques to address these limitations by integrating structured elements into the design and operation of language models. Structure, in this context, is defined as the organization and representation of data in systematic, hierarchical, or relational ways, along with incorporating structural constraints into the learning and reasoning processes. These elements are integrated at different model development and deployment stages: training, inference, and post-inference. During training, we present techniques for training a graph-assisted question?answering model, and discovering orders that help in effectively generating sets as sequences. In the inference stage, we present techniques for incorporating structure by leveraging code as an intermediate representation. For the post-inference stage, we introduce methods that integrate a memory to allow the model to leverage feedback without additional training. 

Together, these techniques demonstrate that conventional text-in-text-out solutions may fail to leverage beneficial structural properties apparent to model stakeholders. Incorporating structures in the model development process requires a careful look at the problem setup, but often relatively straightforward implementation can pay significant dividends—a little structure goes a long way. 

We conclude by positing that the next generation of AI systems will treat LLMs as powerful kernels upon which flexible inference procedures can be built to enhance complex reasoning. This approach, driven by the concept of inference-time compute, has the potential to significantly improve the problem-solving capabilities of AI.  

History

Date

2024-06-01

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Yiming Yang

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC