Normalization of Gender, Dialect and Speaking style using Probabilistic front-ends
This paper analyzes the capability of probabilistic Multilayer Perceptron (MLP) front-end to perform various normalizations for robust Automatic Speech Recognition (ASR). We find decision trees to be a useful tool for investigating the normalization of the feature space achieved by various front-ends. We introduce additional questions for different environmental conditions to the training of the phonetic context decision tree, and count the number of splits dedicated to lexical discrimination using context, and to these environmental conditions. We compare (1) BottleNeck (BN) features and (2) standard stacked Mel Frequency Cepstral Coefficients (MFCC) with LDA. In previous work, we found the BN front-end to be effective in reducing the number of gender questions than MFCC, which may be part of the reason why BN front-ends can achieve significant improvements. In this work, we extend this approach to the analysis of dialect on a large database of Pan-Arabic speech.