%0 Journal Article %A Xu, Di %A Wang, Yun %A Metze, Florian %D 2014 %T EM-based Phoneme Confusion Matrix Generation for Low-resource Spoken Term Detection %U https://kilthub.cmu.edu/articles/journal_contribution/EM-based_Phoneme_Confusion_Matrix_Generation_for_Low-resource_Spoken_Term_Detection/6473342 %R 10.1184/R1/6473342.v1 %2 https://kilthub.cmu.edu/ndownloader/files/11902925 %K Expectation-maximization algorithm %K machine learning %K information retrieval %K spoken term detection %K out-of-vocabulary words %X

The idea of using a data-driven phoneme confusion matrix (PCM) to enhance speech recognition and retrieval performance is not new to the speech community. Although empirical results show various degrees of improvements brought by introducing a PCM, the underlying data-driven processes introduced in most papers are rather ad-hoc and lack rigorous statistical justifications. In this paper we will focus on the statistical aspects of PCM generation, propose and justify a novel expectation-maximization based algorithm for data-driven PCM generation. We will evaluate the performance of the generated PCMs under the context of low-resource spoken term detection, with primary focus on out-of-vocabulary keywords.

%I Carnegie Mellon University