Carnegie Mellon University
Browse
file.pdf (42.25 kB)

Optimizing Lexical and Ngram Coverage via Judicious Use of Linguistic Data

Download (42.25 kB)
journal contribution
posted on 1989-01-01, 00:00 authored by Roni Rosenfeld

I study the effect of various types and amounts of North American Business language data on the quality of the derived vocabulary, and use my findings to derive an improved ranking of the words, using only 19% of the NAB corpus. I then study the conflicting effects of increased vocabulary size on a speech recognizer’s accuracy, and use the result to pick an optimal vocabulary size. A similar analysis of ngram coverage yields a very different outcome, with the best system being the one based on the most data.

History

Publisher Statement

All Rights Reserved

Date

1989-01-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC