Towards More Factual Large Language Models: Parametric and Non-parametric Approaches
Large language models (LLMs) are increasingly important in assisting people to access information, ranging from simple factoid questions such as “where is the world’s largest ice sheet located” to complex questions that require accessing real-time information and reasoning such as “plan a vacation in Miami”. There are two paradigms to handle questions that require factual knowledge: parametric approaches that store knowledge within the parameters of LLMs and elicit this knowledge through prompting, and non-parametric approaches that offload knowledge retrieval to an external non-parametric datastore. In this dissertation, we aim to study, compare, and enhance the capacity of both paradigms.
Since LLMs have accumulated a large amount of knowledge within their parameters through pre-training on diverse corpora, they can directly generate answers when prompted with questions. In the first part of the dissertation, we focus on parametric approaches that utilize the factual knowledge contained in the parameters of LLMs. We first study methods to extract more knowledge by ensembling multiple predictions derived from diverse prompts. Then, we calibrate LLMs to make them trustworthy when answering questions that fall beyond their scope of knowledge. We find that after LLMs memorize documents perfectly to the extent of reproducing them verbatim, they still often fail to answer questions about them. To enhance the capacity of LLMs to absorb knowledge from documents, we propose pre-instruction-tuning that teaches them the task of question-answering before pre-training them on documents.
Parametric approaches offer a simple interface, but they suffer from hallucinations and lack access to real-time external information. In the second part of the dissertation, we focus on non-parametric approaches that augment LLMs with a non-parametric datastore, typically constructed from a document corpus and a retriever. The standard retrieval-augmented generation (RAG) pipeline consists of an embedding-based retriever and an LLM-based-generator, which typically require separate training procedures and are often limited by the retriever’s performance. We introduce an end-to-end solution that fuses retrieval and generation within a single transformer and directly uses the attention mechanism for retrieval purposes. To address complex questions demanding detailed responses, we introduce Active RAG, which dynamically and proactively retrieves information throughout the generation process. Finally, we conclude by comparing and reconciling both paradigms and providing insight into future directions.
History
Date
2024-06-24Degree Type
- Dissertation
Department
- Language Technologies Institute
Degree Name
- Doctor of Philosophy (PhD)