On-Line Algorithms for Combining Language Models

Kalai, Adam; Chen, Stanley F; Blum, Avrim; Rosenfeld, Roni

doi:10.1184/R1/6608042.v1

file.pdf (110.77 kB)

On-Line Algorithms for Combining Language Models

journal contribution

posted on 1997-08-01, 00:00 authored by Adam Kalai, Stanley F Chen, Avrim BlumAvrim Blum, Roni Rosenfeld

Multiple language models are combined for many tasks in language modeling, such as domain and topic adaptation. In this work, we compare on-line algorithms from machine learning to existing algorithms for combining language models. On-line algorithms developed for this problem have parameters that are updated dynamically to adapt to a data set during evaluation. On-line analysis provides guarantees that these algorithms will perform nearly as well as the best model chosen in hindsight from a large class of models, e.g., the set of all static mixtures. We describe several on-line algorithms and present results comparing these techniques with existing language modeling combination methods on the task of domain adaptation. We demonstrate that in some situations, on-line techniques can significantly outperform static mixtures (by over 10% in terms of perplexity), and are especially effective when the nature of the test data is unknown or changesover time.

History

Publisher Statement

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Date

1997-08-01

Usage metrics

Keywords

computer sciences

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

On-Line Algorithms for Combining Language Models

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports