Summarizing Text Documents: Sentence Selection and Evaluation Metrics

Goldstein, Jade; Kantrowitz, Mark; Mittal, Vibhu; Carbonell, Jaime G.

doi:10.1184/R1/6610025.v1

file.pdf (262.01 kB)

Summarizing Text Documents: Sentence Selection and Evaluation Metrics

journal contribution

posted on 1976-01-01, 00:00 authored by Jade Goldstein, Mark Kantrowitz, Vibhu Mittal, Jaime G. Carbonell

Human-quality text summarization systems are difficult to design, and even more difficult to evaluate, in part because documents can differ along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for inclusion in a summary. This paper presents our analysis of news-article summaries generated by sentence selection. Sentences are ranked for potential inclusion in the summary using a weighted combination of statistical and linguistic features. The statistical features were adapted from standard IR methods. The potential linguistic ones were derived from an analysis of news-wire summaries. To evaluate these features we use a normalized version of precision-recall curves, with a baseline of random sentence selection, as well as analyze the properties of such a baseline. We illustrate our discussions with empirical results showing the importance of corpus-dependent baseline summarization standards, compression ratios and carefully crafted long queries.

History

Publisher Statement

Date

1976-01-01

Usage metrics

Keywords

computer sciences

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Summarizing Text Documents: Sentence Selection and Evaluation Metrics

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports