Improvements to speaker adaptive training of deep neural networks

Miao, Yajie; Jiang, Lu; Chiu, Justin; Zhang, Hao; Metze, Florian

doi:10.1184/R1/6473414.v1

Improvements to speaker adaptive training of deep neural networks

journal contribution

posted on 2014-10-01, 00:00 authored by Yajie Miao, Lu Jiang, Justin Chiu, Hao Zhang, Florian MetzeFlorian Metze

Speaker adaptive training (SAT) is a well studied technique for Gaussian mixture acoustic models (GMMs). Recently we proposed to perform SAT for deep neural networks (DNNs), with speaker i-vectors applied in feature learning. The resulting SAT-DNN models significantly outperform DNNs on word error rates (WERs). In this paper, we present different methods to further improve and extend SAT-DNN. First, we conduct detailed analysis to investigate i-vector extractor training and flexible feature fusion. Second, the SAT-DNN approach is extended to improve tasks including bottleneck feature (BNF) generation, convolutional neural network (CNN) acoustic modeling and multilingual DNN-based feature extraction. Third, for transcribing multimedia data, we enrich the i-vector representation with global speaker attributes (age, gender, etc.) obtained automatically from video signals. On a collection of instructional videos, incorporation of the additional visual features is observed to boost the recognition accuracy of SAT-DNN.

History

Publisher Statement

© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Date

2014-10-01

Usage metrics

Keywords

Deep neural networks speaker adaptive training speech recognition Information and Computing Sciences not elsewhere classified

Licence

In Copyright

Improvements to speaker adaptive training of deep neural networks

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports