DNN acoustic modeling with modular multi-lingual feature extraction networks

Gehring, Jonas; Nguyen, Quoc Bao; Metze, Florian; Waibel, Alexander

doi:10.1184/R1/6473330.v1

file.pdf (185.51 kB)

DNN acoustic modeling with modular multi-lingual feature extraction networks

journal contribution

posted on 2013-12-01, 00:00 authored by Jonas Gehring, Quoc Bao Nguyen, Florian MetzeFlorian Metze, Alexander WaibelAlexander Waibel

In this work, we propose several deep neural network architectures that are able to leverage data from multiple languages. Modularity is achieved by training networks for extracting high-level features and for estimating phoneme state posteriors separately, and then combining them for decoding in a hybrid DNN/HMM setup. This approach has been shown to achieve superior performance for single-language systems, and here we demonstrate that feature extractors benefit significantly from being trained as multi-lingual networks with shared hidden representations. We also show that existing mono-lingual networks can be re-used in a modular fashion to achieve a similar level of performance without having to train new networks on multi-lingual data. Furthermore, we investigate in extending these architectures to make use of language-specific acoustic features. Evaluations are performed on a low-resource conversational telephone speech transcription task in Vietnamese, while additional data for acoustic model training is provided in Pashto, Tagalog, Turkish, and Cantonese. Improvements of up to 17.4% and 13.8% over mono-lingual GMMs and DNNs, respectively, are obtained.

History

Publisher Statement

© 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Date

2013-12-01

Usage metrics

Keywords

Deep Neural Networks Multi-Lingual Acoustic Modeling Large-Vocabulary Speech Recognition Low-Resource Acoustic Modeling

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

DNN acoustic modeling with modular multi-lingual feature extraction networks

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports