Carnegie Mellon University
Browse

Internationalizing Speech Technology through Language Independent Lexical Acquisition

Download (329.1 kB)
journal contribution
posted on 2005-08-01, 00:00 authored by Bertrand A Damiba, Alexander RudnickyAlexander Rudnicky

Software internationalization, the process of making software easier to localize for specific languages, has  deep implications when applied to speech technology, where the goal of the task lies in the very essence of the particular language.


A great deal of work and fine-tuning normally goes into the development of speech software for a single language, say English. This tuning complicates a port to different languages. The inherent identity of a language manifests itself in its lexicon, where its character set, phoneme set, pronunciation rules are revealed. We propose a decomposition of the lexicon building process, into four discrete and sequential steps:


(a) Transliteration code points from Unicode.
(b) Orthographic standardization rules.
(c) Application of grapheme to phoneme rules.
(d) Application of phonological rules.


In following these steps one should gain accessibility to most of the existing speech/language processing tools, thereby internationalizing one's speech technology. In addition, adhering to this decomposition should allow for a reduction of rule conflicts that often plague the phoneticizing process.


Our work makes two main contributions: it proposes a systematic procedure for the internationalization of  automatic speech recognition (ASR) systems. It also proposes a particular decomposition of the phoneticization process that facilitates internationalization by non-expert informants.

History

Date

2005-08-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC