posted on 1998-03-01, 00:00authored byRong Jin, Alexander Hauptmann
In this paper, we show how we can learn to
select good words for a document title. We
view the problem of selecting good title words
for a document as a variant of an Information
Retrieval problem. Each title word is treated as
a “document” and selection of appropriate title
words as finding relevant “documents”. Based
on our training collection consisting of 40,000
document and title pairs, we learn the
“document” representations for all the title
words and apply these learned representations
to select appropriate title words over 10,000
test documents. Compared to other learning
approaches, namely K nearest neighbor
approach, a Naïve Bayesian approach and a
variant of a machine translation model, we
find that our approach is significantly better as
indicated by the F1 metric.