Deep maxout networks for low-resource speech recognition

Miao, Yajie; Metze, Florian; Rawat, Shourabh

doi:10.1184/R1/6473306.v1

file.pdf (182.94 kB)

Deep maxout networks for low-resource speech recognition

journal contribution

posted on 2013-12-01, 00:00 authored by Yajie Miao, Florian MetzeFlorian Metze, Shourabh Rawat

As a feed-forward architecture, the recently proposed maxout networks integrate dropout naturally and show state-of-the-art results on various computer vision datasets. This paper investigates the application of deep maxout networks (DMNs) to large vocabulary continuous speech recognition (LVCSR) tasks. Our focus is on the particular advantage of DMNs under low-resource conditions with limited transcribed speech. We extend DMNs to hybrid and bottleneck feature systems, and explore optimal network structures (number of maxout layers, pooling strategy, etc) for both setups. On the newly released Babel corpus, behaviors of DMNs are extensively studied under different levels of data availability. Experiments show that DMNs improve low-resource speech recognition significantly. Moreover, DMNs introduce sparsity to their hidden activations and thus can act as sparse feature extractors.

History

Publisher Statement

© 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Date

2013-12-01

Usage metrics

Keywords

Deep maxout networks speech recognition low-resource conditions deep learning

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Deep maxout networks for low-resource speech recognition

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports