posted on 1997-03-01, 00:00authored byLucian Vlad Lita, Jaime G. Carbonell
Anticipating the availability of large question-answer
datasets, we propose a principled, data-driven
Instance-Based approach to Question Answering.
Most question answering systems incorporate
three major steps: classify questions according
to answer types, formulate queries for document
retrieval, and extract actual answers. Under our approach,
strategies for answering new questions are
directly learned from training data. We learn models
of answer type, query content, and answer extraction
from clusters of similar questions. We view
the answer type as a distribution, rather than a class
in an ontology. In addition to query expansion, we
learn general content features from training data and
use them to enhance the queries. Finally, we treat
answer extraction as a binary classification problem
in which text snippets are labeled as correct or incorrect
answers. We present a basic implementation
of these concepts that achieves a good performance
on TREC test data.