Adaptive Statistical Language Modeling: A Maximum Entropy Approach

Rosenfeld, Roni

doi:10.1184/R1/6603032.v1

file.pdf (725.56 kB)

Adaptive Statistical Language Modeling: A Maximum Entropy Approach

journal contribution

posted on 2004-01-01, 00:00 authored by Roni Rosenfeld

Language modeling is the attempt to characterize, capture and exploit regularities in natural language. In statistical language modeling, large amounts of text are used to automatically determine the model’s parameters. Language modeling is useful in automatic speech recognition, machine translation, and any other application that processes natural language with incomplete knowledge.

In this thesis, I view language as an information source which emits a stream of symbols from a ﬁnite alphabet (the vocabulary). The goal of language modeling is then to identify and exploit sources of information in the language stream, so as to minimize its perceived entropy.

Most existing statistical language models exploit the immediate past only. To extract information from further back in the document’s history, I use trigger pairs as the basic information bearing elements. This allows the model to adapt its expectations to the topic of discourse.

Next, statistical evidence from many sources must be combined. Traditionally, linear interpolation and its variants have been used, but these are shown here to be seriously deﬁcient. Instead, I apply the principle of Maximum Entropy (ME). Each information source gives rise to a set of constraints, to be imposed on the combined estimate. The intersection of these constraints is the set of probability functions which are consistent with all the information sources. The function with the highest entropy within that set is the ME solution. Given consistent statistical evidence, a unique ME solution is guaranteed to exist, and an iterative algorithm exists which is guaranteed to converge to it. The ME framework is extremely general: any phenomenon that can be described in terms of statistics of the text can be readily incorporated.

An adaptive language model based on the ME approach was trained on the Wall Street Journal corpus, and showed 32%–39% perplexity reduction over the baseline. When interfaced to SPHINX-II, Carnegie Mellon’s speech recognizer, it reduced its error rate by 10%–14%.

The signiﬁcance of this thesis lies in improving language modeling, reducing speech recognition error rate, and in being the ﬁrst large-scale test of the approach. It illustrates the feasibility of incorporating many diverse knowledge sources in a single, uniﬁed statistical framework.

History

Publisher Statement

Date

2004-01-01

Usage metrics

Keywords

language modeling adaptive language modeling statistical language modeling maximum entropy speech recognition

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Adaptive Statistical Language Modeling: A Maximum Entropy Approach

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports