Carnegie Mellon University
Browse
file.pdf (250.95 kB)

EM-based Phoneme Confusion Matrix Generation for Low-resource Spoken Term Detection

Download (250.95 kB)
journal contribution
posted on 2014-12-01, 00:00 authored by Di Xu, Yun Wang, Florian MetzeFlorian Metze

The idea of using a data-driven phoneme confusion matrix (PCM) to enhance speech recognition and retrieval performance is not new to the speech community. Although empirical results show various degrees of improvements brought by introducing a PCM, the underlying data-driven processes introduced in most papers are rather ad-hoc and lack rigorous statistical justifications. In this paper we will focus on the statistical aspects of PCM generation, propose and justify a novel expectation-maximization based algorithm for data-driven PCM generation. We will evaluate the performance of the generated PCMs under the context of low-resource spoken term detection, with primary focus on out-of-vocabulary keywords.

History

Publisher Statement

© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Date

2014-12-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC