Carnegie Mellon University
Browse

Towards More Factual Large Language Models: Parametric and Non-parametric Approaches

Download (2.75 MB)
thesis
posted on 2024-07-23, 19:03 authored by Zhengbao JiangZhengbao Jiang

Large language models (LLMs) are increasingly important in assisting people  to access information, ranging from simple factoid questions such as “where is  the world’s largest ice sheet located” to complex questions that require accessing  real-time information and reasoning such as “plan a vacation in Miami”. There  are two paradigms to handle questions that require factual knowledge: parametric approaches that store knowledge within the parameters of LLMs and elicit this knowledge through prompting, and non-parametric approaches that offload knowledge retrieval to an external non-parametric datastore. In this dissertation, we aim  to study, compare, and enhance the capacity of both paradigms.  

Since LLMs have accumulated a large amount of knowledge within their parameters through pre-training on diverse corpora, they can directly generate answers  when prompted with questions. In the first part of the dissertation, we focus on  parametric approaches that utilize the factual knowledge contained in the parameters of LLMs. We first study methods to extract more knowledge by ensembling  multiple predictions derived from diverse prompts. Then, we calibrate LLMs to  make them trustworthy when answering questions that fall beyond their scope  of knowledge. We find that after LLMs memorize documents perfectly to the extent of reproducing them verbatim, they still often fail to answer questions about  them. To enhance the capacity of LLMs to absorb knowledge from documents, we  propose pre-instruction-tuning that teaches them the task of question-answering  before pre-training them on documents. 

Parametric approaches offer a simple interface, but they suffer from hallucinations and lack access to real-time external information. In the second part of  the dissertation, we focus on non-parametric approaches that augment LLMs with  a non-parametric datastore, typically constructed from a document corpus and a  retriever. The standard retrieval-augmented generation (RAG) pipeline consists of  an embedding-based retriever and an LLM-based-generator, which typically require  separate training procedures and are often limited by the retriever’s performance.  We introduce an end-to-end solution that fuses retrieval and generation within  a single transformer and directly uses the attention mechanism for retrieval purposes. To address complex questions demanding detailed responses, we introduce  Active RAG, which dynamically and proactively retrieves information throughout  the generation process. Finally, we conclude by comparing and reconciling both  paradigms and providing insight into future directions.  

History

Date

2024-06-24

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Graham Neubig

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC