Leveraging Textual Semantics for Knowledge Graph Acquisition and Application
Knowledge Graphs (KGs), which represent world knowledge through entities and relations, are ubiquitous in real-world applications. Besides their structural nature, KGs offer rich textual information, as entities usually correspond to real-world objects with specific names and descriptions. Despite the importance of such information, it has been largely overlooked or inadequately explored in existing studies.
This thesis aims to integrate the textual information into the modeling of KGs by utilizing Pre-trained Language Models (PLMs), which have demonstrated effectiveness in capturing the semantic meanings of natural language. This goal is carried out on two complementary parts: the acquisition of KGs to enhance their qualities, and the application of KGs to address user queries.
In Part I, we focus on KG acquisition through text. We begin with a pre-training framework that jointly learns the vector representations of KGs and text. It features KG-text dual modules that mutually enhance each other, achieving strong results on relation extraction and entity classification. (Chapter 2). To address scalability challenges in large KGs, we propose a retrieval-enhanced text-generation model for KG completion. It leverages semantically relevant triplets from KGs to guide the generation of missing entities, demonstrating state-of-the-art performance while maintaining low memory usage (Chapter 3).
In Part II, we turn our attention to applying KGs to the crucial task of Question Answering (QA). In the setting that the answers are sourced from KGs, we propose a framework that jointly generates logical queries and text answers to produce more accurate and robust results (Chapter 4). Then we extend to the scenarios where the answers mainly stem from text corpora instead of KGs. Our proposed method leverages KGs to construct links among the text passages. Such structural information is leveraged to re-rank and prune related passages for each question, significantly reducing computational costs (Chapter 5). Finally, we tackle the setting of incomplete KGs. We introduce the first benchmark dataset to assess the impact of KG completion methods on question answering. Our experiments highlight the necessity to jointly study the acquisition and application of KGs (Chapter 6).
History
Date
2023-11-14Degree Type
- Dissertation
Department
- Language Technologies Institute
Degree Name
- Doctor of Philosophy (PhD)