Enhancing Language Models with Structured Reasoning

Madaan, Aman

doi:10.1184/R1/26096977.v1

amadaan_phd_lti_2024.pdf (14.02 MB)

Enhancing Language Models with Structured Reasoning

thesis

posted on 2024-06-28, 15:40 authored by Aman Madaan

The rapid growth in the areas of language generation and reasoning has been significantly facilitated by the availability of user-friendly libraries wrapped around large language models. These solutions often rely on the Seq2Seq paradigm, treating all problems as text-to-text transformations. While convenient, this approach faces limitations in practical deployments: brittleness when handling complex problems, the absence of feedback mechanisms, and an inherent black-box nature hindering model interpretability.

This thesis presents techniques to address these limitations by integrating structured elements into the design and operation of language models. Structure, in this context, is defined as the organization and representation of data in systematic, hierarchical, or relational ways, along with incorporating structural constraints into the learning and reasoning processes. These elements are integrated at different model development and deployment stages: training, inference, and post-inference. During training, we present techniques for training a graph-assisted question?answering model, and discovering orders that help in effectively generating sets as sequences. In the inference stage, we present techniques for incorporating structure by leveraging code as an intermediate representation. For the post-inference stage, we introduce methods that integrate a memory to allow the model to leverage feedback without additional training.

Together, these techniques demonstrate that conventional text-in-text-out solutions may fail to leverage beneficial structural properties apparent to model stakeholders. Incorporating structures in the model development process requires a careful look at the problem setup, but often relatively straightforward implementation can pay significant dividends—a little structure goes a long way.

We conclude by positing that the next generation of AI systems will treat LLMs as powerful kernels upon which flexible inference procedures can be built to enhance complex reasoning. This approach, driven by the concept of inference-time compute, has the potential to significantly improve the problem-solving capabilities of AI.

History

Date

2024-06-01

Degree Type

Dissertation

Department

Language Technologies Institute

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Yiming Yang

Usage metrics

Keywords

structured reasoning language models

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Enhancing Language Models with Structured Reasoning

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports