Data Selection for Speech Recognition

Wu, Yi; Rudnicky, Alexander; Zhang, Rong

doi:10.1184/R1/6604616.v1

Data Selection for Speech Recognition

journal contribution

posted on 2008-07-01, 00:00 authored by Yi Wu, Alexander RudnickyAlexander Rudnicky, Rong Zhang

This paper presents a strategy for efﬁciently selecting informative data from large corpora of transcribed speech. We propose to choose data uniformly according to the distribution of some target speech unit (phoneme, word, character, etc). In our experiment, in contrast to the common belief that “there is no data like more data”, we found it possible to select a highly informative subset of data that produces recognition performance comparable to a system that makes use of a much larger amount of data. At the same time, our selection process is efﬁcient and fast.

History

Date

2008-07-01

Usage metrics

Keywords

data selection maximum entropy speech recognition acoustic modeling Information and Computing Sciences not elsewhere classified

Licence

In Copyright

Data Selection for Speech Recognition

History

Date

Usage metrics

Categories

Keywords

Licence

Exports