Language Modeling with Limited Domain Data

Rudnicky, Alexander

doi:10.1184/R1/6606746.v1

Language Modeling with Limited Domain Data

journal contribution

posted on 2008-12-01, 00:00 authored by Alexander RudnickyAlexander Rudnicky

Generic recognition systems contain language models which are
representative of a broad corpus. In actual practice, however, recognition
is usually on a coherent text covering a single topic, suggesting
that knowledge of the topic at hand can be used to advantage. A base
model can be augmented with information from a small sample of
domain-specific language data to significantly improve recognition
performance. Good performance may be obtained by merging in
only those n-grams that include words that are out of vocabulary
with respect to the base model.

History

Date

2008-12-01

Usage metrics

Keywords

computer sciences Information and Computing Sciences not elsewhere classified

Licence

In Copyright

Language Modeling with Limited Domain Data

History

Date

Usage metrics

Categories

Keywords

Licence

Exports