Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition

Nallasamy, Udhyakumar; Metze, Florian; Schultz, Tanja

doi:10.1184/R1/6473357.v1

file.pdf (238.33 kB)

Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition

journal contribution

posted on 2013-09-01, 00:00 authored by Udhyakumar Nallasamy, Florian MetzeFlorian Metze, Tanja Schultz

State-of-the-art Automatic Speech Recognition (ASR) models struggle to handle accented speech, particularly if the target accent is under-represented in the training data. The acoustic variations presented by an unfamiliar accent, render the ASR polyphone decision tree (PDT) and its associated Gaussian mixture models (GMM) misfit to the test data. In this paper, we improve on the previous work of adapting the polyphone decision tree, using a semi-continuous model based approach to address the problem of data sparsity. We extend the existing PDT to introduce additional states with shared parameters, corresponding to the new contextual variations identified in the adaptation data, while still robustly estimating the state based parameters on a small adaptation set. We conduct ASR experiments on Arabic and English accents and show that our technique performs better than Maximum A-Posteriori (MAP) adaptation and a previous implementation of polyphone decision tree specialization (PDTS). Compared to MAP adaptation, we obtain 7% relative improvement for Dialectal Arabic and 13.8% relative improvement for Accented English.

History

Publisher Statement

Date

2013-09-01

Usage metrics

Keywords

automatic speech recognition accent adaptation

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports