Carnegie Mellon University
Browse
- No file added yet -

The Carnegie Mellon Communicator Corpus

Download (58.44 kB)
journal contribution
posted on 1992-10-01, 00:00 authored by Christina Bennett, Alexander RudnickyAlexander Rudnicky

As part of the DARPA Communicator program, Carnegie Mellon has, over the past three years, collected a large corpus of speech produced by callers to its Travel Planning system. To date, a total of 180,605 utterances (90.9 hours) have been collected. The data were used for a number of purposes, including acoustic and language modeling and the development of a spoken dialog system. The collection, transcription and annotation of these data prompted us to develop a number of procedures for managing the transcription process and for ensuring accuracy. We describe these, as well as some results based on these data. A portion of this corpus, covering the years 1999-2001, is being published for research purposes. 

History

Date

1992-10-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC