file.pdf (250.95 kB)
EM-based Phoneme Confusion Matrix Generation for Low-resource Spoken Term Detection
journal contribution
posted on 2014-12-01, 00:00 authored by Di Xu, Yun Wang, Florian MetzeFlorian MetzeThe idea of using a data-driven phoneme confusion matrix (PCM) to enhance speech recognition and retrieval performance is not new to the speech community. Although empirical results show various degrees of improvements brought by introducing a PCM, the underlying data-driven processes introduced in most papers are rather ad-hoc and lack rigorous statistical justifications. In this paper we will focus on the statistical aspects of PCM generation, propose and justify a novel expectation-maximization based algorithm for data-driven PCM generation. We will evaluate the performance of the generated PCMs under the context of low-resource spoken term detection, with primary focus on out-of-vocabulary keywords.