Learning Out-of-Vocabulary Words in Automatic Speech Recognition
Out-of-vocabulary (OOV) words are unknown words that appear in the testing speech but not in the recognition vocabulary. They are usually important content words such as names and locations which contain information crucial to the success of many speech recognition tasks. However, most speech recognition systems are closed-vocabulary recognizers that only recognize words in a fixed finite vocabulary. When there are OOV words in the testing speech, such systems cannot identify OOV words, but misrecognize them as in-vocabulary (IV) words. Furthermore, the errors made on OOV words also affect the recognition accuracy of their surrounding IV words. Therefore, speech recognition systems in which OOV words can be detected and recovered are of great interest.
As simply applying a large vocabulary in a recognizer cannot solve the OOV word problem, several alternative approaches had been proposed. One is to use a hybrid lexicon and hybrid language model which incorporate both word and sublexical units during decoding. Another popular OOV word detection method is to locate where the word decoding and the phone decoding results are in disagreement. Some other methods involve with a classification process to find possible OOV words using confidence scores and other evidence. For OOV word recovery, the phoneme-to-grapheme (P2G) conversion is usually applied to predict the written form of an OOV word.
Current OOV research focuses on detecting the presence of OOV words in the testing speech. There is only limited work about how to convert OOV words into IV words of a recognizer. In this thesis, we therefore investigated learning OOV words in speech recognition. We showed that it is feasible for a recognizer to automatically learn new words and operate on an open vocabulary. Specifically, we built an OOV word learning framework which consists of three major components. The first component is OOV word detection, where we built hybrid systems using different sub-lexical units to detect OOV words during decoding. We also studied to improve the hybrid system performance using system combination and OOV word classification techniques. Since OOV words can appear more than once in a conversation or over a period of time, in the OOV word clustering component, we worked on finding multiple instances of the same OOV word. At last, in OOV word recovery, we explored how to integrate identified OOV words into the recognizer’s lexicon and language model. The proposed work was tested on tasks with different speaking styles and recording conditions including the Wall Street Journal (WSJ), Broadcast News (BN), and Switchboard (SWB) datasets. Our experimental results show that we are able to detect and recover up to 40% OOV words using the proposed OOV word learning framework. Finally, a self-learning speech recognition system will be more robust and has broader applications in real life.
- Language Technologies Institute
- Doctor of Philosophy (PhD)