Carnegie Mellon University
Browse

Word-based Probabilistic Phonetic Retrieval for Low-resource Spoken Term Detection

Download (186.06 kB)
journal contribution
posted on 2014-09-01, 00:00 authored by Di Xu, Florian MetzeFlorian Metze

Two problems make Spoken Term Detection (STD) particularly challenging under low-resource conditions: the low quality of speech recognition hypotheses, and a high number of out-ofvocabulary (OOV) words. In this paper, we propose an intuitive way to handle OOV terms for STD on word-based Confusion Networks using phonetic similarities, and generalize it into a probabilistic and vocabulary-independent retrieval framework. We then reflect on how several heuristics and Machine Learning based methods can be incorporated into this framework to improve retrieval performance. We present experimental results on several low-resource languages from IARPA’s Babel program, such as Assamese, Bengali, Haitian, and Lao.

History

Date

2014-09-01