Carnegie Mellon University
Browse

The CMU Statistical Language Modeling Toolkit and its use in the 1994 ARPA CSR Evaluation

Download (48.62 kB)
journal contribution
posted on 2003-01-01, 00:00 authored by Roni Rosenfeld

The Carnegie Mellon Statistical Language Modeling (CMU SLM) Toolkit is a set of Unix software tools designed to facilitate language modeling work in the research community. The package, including source code, is freely available for research purposes. As of December 1994, the toolkit is in active use by 23 research groups in 8 countries. It was recently used to process the 2.5 GB NAB corpus for the ARPA CSR community. In this paper, I first discuss the design principles and features of the toolkit. Then, I describe the composition of the NAB corpus, and report on the ngram statistics, standard vocabulary and language models created using the SLM tools.

History

Publisher Statement

All Rights Reserved

Date

2003-01-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC