- No file added yet -
Improving Trigram Language Modeling with the World Wide Web
We propose a novel method for using the World Wide Web to acquire trigram estimates for statistical lan-
guage modeling. We submit an N-gram as a phrase query to web search engines. The search engines return
the number of web pages containing the phrase, from which the N-gram count is estimated. The N-gram
counts are then used to form web-based trigram probability estimates. We discuss the properties of such
estimates, and methods to interpolate them with traditional corpus based trigram estimates. We show that
the interpolated models improve speech recognition word error rate significantly over a small test set.
History
Date
2000-11-01Usage metrics
Categories
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC