Carnegie Mellon University
Browse
- No file added yet -

Improving Trigram Language Modeling with the World Wide Web

Download (119.34 kB)
report
posted on 2023-03-01, 20:28 authored by Ronald RosenfeldRonald Rosenfeld, Xiaohin Zhu

 

We propose a novel method for using the World Wide Web to acquire trigram estimates for statistical lan-

guage modeling. We submit an N-gram as a phrase query to web search engines. The search engines return

the number of web pages containing the phrase, from which the N-gram count is estimated. The N-gram

counts are then used to form web-based trigram probability estimates. We discuss the properties of such

estimates, and methods to interpolate them with traditional corpus based trigram estimates. We show that

the interpolated models improve speech recognition word error rate significantly over a small test set.

History

Date

2000-11-01

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC