Learning Semantic Patterns for Question Generation

Rodrigues, Hugo Patinho

doi:10.1184/R1/24774633.v1

Learning Semantic Patterns for Question Generation

thesis

posted on 2023-12-21, 20:28 authored by Hugo Patinho Rodrigues

Question Generation (QG) is the Natural Language Processing (NLP) task dedicated to the automatic generation of questions from raw text. It can be useful in many different scenarios, from educational settings, in which the generation of questions can eliminate a huge burden on professors and instructors in creating them to assess their students, to populating the knowledge base of a conversational agent. In this thesis we present a pattern-based system, GEN, that automatically performs the task of QG. Given an information source and a set of seeds constituted of question/answer/sentence triplets, GEN outputs a set of questions (and answers, if possible). Contrary to other similar systems, GEN is not built upon hand-crafted templates, and, instead of relying in patterns that only go to the lexical and syntactic level, it deeply explores the existence of semantic information in a flexible pattern matching process, allowing it to occur at different linguistic levels. In addition, instead of using a limited number of rules or models incorporated at design time, GEN is able to learn from questions corrected by the user, in order to perform better in future iterations.

In this work, we have also made a contribution to QG automatic evaluation. Many authors rely on automatic metrics, instead of relying on manual evaluations, as their computation is mostly free. However, corpora generally used as reference is very incomplete, containing just a couple of hypotheses per source sentence. With that in mind, we contribute with the Monserrate corpus, containing 26 times more questions per reference sentence, on average, than any other available dataset. The implications of such a large size for a reference are also studied, and we concluded that Monserrate is “exhaustive” enough for QG evaluation. We benchmark GEN against current state of the art QG systems, and show that our approach is able to generate quality questions and surpass a neural network approach, given as input just 8 seeds. We evaluate the systems both with automatic metrics, and through human annotators. Finally, we employ GEN in two different scenarios. First, we show that it can be used as an authoring tool to help professors create questions for their courses, by presenting the best questions in the top of a ranked list. In this experiment, GEN learns new patterns and ranks the generated questions due to a simulated teacher feedback. To the best of our knowledge, GEN is the only QG system that can be easily adapted to this scenario, and benefits from not requiring a linguistic expert as user. Secondly, we apply GEN in a Question Answering (QA) setup, where it is used to create questions that attempt to improve the performance of an external system.

History

Date

2021-03-03

Degree Type

Dissertation

Department

Language Technologies Institute

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Luísa Coheur Eric Nyberg

Usage metrics

Keywords

Question Generation evaluation mechanism Corpora Automatic Metrics Natural Lan?guage Generation Computational Logic and Formal Languages

Licence

CC BY 4.0

Learning Semantic Patterns for Question Generation

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports