Carnegie Mellon University
Browse

Language Modeling with Limited Domain Data

Download (66.13 kB)
journal contribution
posted on 2008-12-01, 00:00 authored by Alexander RudnickyAlexander Rudnicky
<p>Generic recognition systems contain language models which are<br>representative of a broad corpus. In actual practice, however, recognition<br>is usually on a coherent text covering a single topic, suggesting<br>that knowledge of the topic at hand can be used to advantage. A base<br>model can be augmented with information from a small sample of<br>domain-specific language data to significantly improve recognition<br>performance. Good performance may be obtained by merging in<br>only those n-grams that include words that are out of vocabulary<br>with respect to the base model.</p>

History

Date

2008-12-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC