file.pdf (48.62 kB)
0/0

The CMU Statistical Language Modeling Toolkit and its use in the 1994 ARPA CSR Evaluation

Download (48.62 kB)
journal contribution
posted on 01.01.2003 by Roni Rosenfeld

The Carnegie Mellon Statistical Language Modeling (CMU SLM) Toolkit is a set of Unix software tools designed to facilitate language modeling work in the research community. The package, including source code, is freely available for research purposes. As of December 1994, the toolkit is in active use by 23 research groups in 8 countries. It was recently used to process the 2.5 GB NAB corpus for the ARPA CSR community. In this paper, I first discuss the design principles and features of the toolkit. Then, I describe the composition of the NAB corpus, and report on the ngram statistics, standard vocabulary and language models created using the SLM tools.

History

Publisher Statement

All Rights Reserved

Date

01/01/2003

Exports

Exports