Carnegie Mellon University
Browse

Learning Semantic Patterns for Question Generation

Download (2.35 MB)
thesis
posted on 2023-12-21, 20:28 authored by Hugo Patinho Rodrigues

 Question Generation (QG) is the Natural Language Processing (NLP) task dedicated to the automatic generation of questions from raw text. It can be useful in many different scenarios, from educational settings, in which the generation of questions can eliminate a huge burden on professors and instructors in creating them to assess their students, to populating the knowledge base of a conversational agent. In this thesis we present a pattern-based system, GEN, that automatically performs the task of QG. Given an information source and a set of seeds constituted of question/answer/sentence triplets, GEN outputs a set of questions (and answers, if possible). Contrary to other similar systems, GEN is not built upon hand-crafted templates, and, instead of relying in patterns that only go to the lexical and syntactic level, it deeply explores the existence of semantic information in a flexible pattern matching process, allowing it to occur at different linguistic levels. In addition, instead of using a limited number of rules or models incorporated at design time, GEN is able to learn from questions corrected by the user, in order to perform better in future iterations.

 In this work, we have also made a contribution to QG automatic evaluation. Many authors rely on automatic metrics, instead of relying on manual evaluations, as their computation is mostly free. However, corpora generally used as reference is very incomplete, containing just a couple of hypotheses per source sentence. With that in mind, we contribute with the Monserrate corpus, containing 26 times more questions per reference sentence, on average, than any other available dataset. The implications of such a large size for a reference are also studied, and we concluded that Monserrate is “exhaustive” enough for QG evaluation. We benchmark GEN against current state of the art QG systems, and show that our approach is able to generate quality questions and surpass a neural network approach, given as input just 8 seeds. We evaluate the systems both with automatic metrics, and through human annotators. Finally, we employ GEN in two different scenarios. First, we show that it can be used as an authoring tool to help professors create questions for their courses, by presenting the best questions in the top of a ranked list. In this experiment, GEN learns new patterns and ranks the generated questions due to a simulated teacher feedback. To the best of our knowledge, GEN is the only QG system that can be easily adapted to this scenario, and benefits from not requiring a linguistic expert as user. Secondly, we apply GEN in a Question Answering (QA) setup, where it is used to create questions that attempt to improve the performance of an external system. 

History

Date

2021-03-03

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Luísa Coheur Eric Nyberg

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC