Building a Vocabulary Self-Learning Speech Recognition System
This paper presents initial studies on building a vocabulary self-learning speech recognition system that can automatically learn unknown words and expand its recognition vocabulary. Our recognizer can detect and recover out-of-vocabulary (OOV) words in speech, then incorporate OOV words into its lexicon and language model (LM). As a result, these unknown words can be correctly recognized when encountered by the recognizer in future. Specifically, we apply the word-fragment hybrid system framework to detect the presence of OOV words. We propose a better phoneme-to-grapheme (P2G) model so as to correctly recover the written form for more OOV words. Furthermore, we estimate LM scores for OOV words using their syntactic and semantic properties. The experimental results show that more than 40% OOV words are successfully learned from the development data, and about 60% learned OOV words are recognized in the testing data.