Headline Generation using a Training Corpus

Jin, Rong; Hauptmann, Alexander

doi:10.1184/R1/6606059.v1

Headline Generation using a Training Corpus

journal contribution

posted on 2015-07-01, 00:00 authored by Rong Jin, Alexander Hauptmann

This paper discusses fundamental issues involved in word selection for title generation. We review several methods for title generation, namely extractive summarization and two versions of a Naïve Bayesian, and compare the performance of those methods using an F1 metric. In addition, we introduce a novel approach to title generation using the k-nearest neighbor (KNN) algorithm. Both the KNN method and a limited-vocabulary Naïve Bayesian method outperform the other evaluated methods with an F1 score of around 20%. Since KNN produces complete and legible titles, we conclude that KNN is a very promising method for title generation, provided good content overlap exists between the training corpus and the test documents