Carnegie Mellon University
Browse
file.pdf (87.07 kB)

Learning better lexical properties for recurrent OOV words

Download (87.07 kB)
journal contribution
posted on 2013-12-01, 00:00 authored by Long Qin, Alexander RudnickyAlexander Rudnicky

Out-of-vocabulary (OOV) words can appear more than once in a conversation or over a period of time. Such multiple instances of the same OOV word provide valuable information for learning the lexical properties of the word. Therefore, we investigated how to estimate better pronunciation, spelling and part-of-speech (POS) label for recurrent OOV words. We first identified recurrent OOV words from the output of a hybrid decoder by applying a bottom-up clustering approach. Then, multiple instances of the same OOV word were used simultaneously to learn properties of the OOV word. The experimental results showed that the bottom-up clustering approach is very effective at detecting the recurrence of OOV words. Furthermore, by using evidence from multiple instances of the same word, the pronunciation accuracy, recovery rate and POS label accuracy of recurrent OOV words can be substantially improved.

History

Publisher Statement

© 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Date

2013-12-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC