Evaluation Metrics For Language Models

Chen, Stanley F; Beeferman, Douglas; Rosenfeld, Roni

doi:10.1184/R1/6605324.v1

file.pdf (79.39 kB)

Evaluation Metrics For Language Models

journal contribution

posted on 2008-01-01, 00:00 authored by Stanley F Chen, Douglas Beeferman, Roni Rosenfeld

The most widely-used evaluation metric for language models for speech recognition is the perplexity of test data. While perplexities can be calculated efficiently and without access to a speech recognizer, they often do not correlate well with speech recognition word-error rates. In this research, we attempt to find a measure that like perplexity is easily calculated but which better predicts speech recognition performance. We investigate two approaches; first, we attempt to extend perplexity by using similar measures that utilize information about language models that perplexity ignores. Second, we attempt to imitate the word-error calculation without using a speech recognizer by artificially generating speech recognition lattices. To test our new metrics, we have built over thirty varied language models. We find that perplexity correlates with word-error rate remarkably well when only considering n-gram models trained on in-domain data. When considering other types of models, our novel metrics are superior to perplexity for predicting speech recognition performance. However, we conclude that none of these measures predict word-error rate sufficiently accurately to be effective tools for language model evaluation in speech recognition.

History

Date

2008-01-01

Usage metrics

Keywords

computer sciences

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Evaluation Metrics For Language Models

History

Date

Usage metrics

Categories

Keywords

Licence

Exports