Carnegie Mellon University
Browse

Automatic Factual Question Generation from Text

Download (1.73 MB)
thesis
posted on 2025-04-18, 20:12 authored by Michael Heilman

Texts with potential educational value are becoming available through the Internet (e.g., Wikipedia, news services). However, using these new texts in classrooms introduces many challenges, one of which is that they usually lack practice exercises and assessments. Here, we address part of this challenge by automating the creation of a specific type of assessment item.

Specifically, we focus on automatically generating factual WH questions. Our goal is to create an automated system that can take as input a text and produce as output questions for assessing a reader’s knowledge of the information in the text. The questions could then be presented to a teacher, who could select and revise the ones that he or she judges to be useful.

After introducing the problem, we describe some of the computational and linguistic challenges presented by factual question generation. We then present an implemented system that leverages existing natural language processing techniques to address some of these challenges. The system uses a combination of manually encoded transformation rules and a statistical question ranker trained on a tailored dataset of labeled system output.

We present experiments that evaluate individual components of the system as well as the system as a whole. We found, among other things, that the question ranker roughly doubled the acceptability rate of top-ranked questions.

In a user study, we tested whether K-12 teachers could efficiently create factual questions by selecting and revising suggestions from the system. Offering automatic suggestions reduced the time and effort spent by participants, though it also affected the types of questions that were created.

This research supports the idea that natural language processing can help teachers efficiently create instructional content. It provides solutions to some of the major challenges in question generation and an analysis and better understanding of those that remain.

Funding

RI-Small: Probabilistic Models for Structure Discovery in Text

Directorate for Computer & Information Science & Engineering

Find out more...

History

Date

2011-01-01

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Noah A. Smith

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC