Carnegie Mellon University
Browse

Normalization of Gender, Dialect and Speaking style using Probabilistic front-ends

Download (74.92 kB)
journal contribution
posted on 2011-01-01, 00:00 authored by Udhyakumar Nallasamy, Florian MetzeFlorian Metze, Thomas Schaaf

This paper analyzes the capability of probabilistic Multilayer Perceptron (MLP) front-end to perform various normalizations for robust Automatic Speech Recognition (ASR). We find decision trees to be a useful tool for investigating the normalization of the feature space achieved by various front-ends. We introduce additional questions for different environmental conditions to the training of the phonetic context decision tree, and count the number of splits dedicated to lexical discrimination using context, and to these environmental conditions. We compare (1) BottleNeck (BN) features and (2) standard stacked Mel Frequency Cepstral Coefficients (MFCC) with LDA. In previous work, we found the BN front-end to be effective in reducing the number of gender questions than MFCC, which may be part of the reason why BN front-ends can achieve significant improvements. In this work, we extend this approach to the analysis of dialect on a large database of Pan-Arabic speech.

History

Date

2011-01-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC