A Frame Level Boosting Training Scheme for Acoustic Modeling
Conventional Boosting algorithms for acoustic modeling have two notable weaknesses. (1) The objective function aims to minimize utterance error rate, though the goal for most speech recognition systems is to reduce word error rate. (2) During Boosting training, an utterance is treated as a unit for resampling and each frame within the same utterance is assigned equal weight. Intuitively, the frames associated with a is classified word should be given more emphasis than others. We propose a frame level Boosting training scheme that addresses these shortcomings and allows each frame to have a different weight. We describe a technique and provide experimental results for this approach.