Internationalizing Speech Technology through Language Independent Lexical Acquisition

Damiba, Bertrand A; Rudnicky, Alexander

doi:10.1184/R1/6606593.v1

Internationalizing Speech Technology through Language Independent Lexical Acquisition

journal contribution

posted on 2005-08-01, 00:00 authored by Bertrand A Damiba, Alexander RudnickyAlexander Rudnicky

Software internationalization, the process of making software easier to localize for specific languages, has deep implications when applied to speech technology, where the goal of the task lies in the very essence of the particular language.

A great deal of work and fine-tuning normally goes into the development of speech software for a single language, say English. This tuning complicates a port to different languages. The inherent identity of a language manifests itself in its lexicon, where its character set, phoneme set, pronunciation rules are revealed. We propose a decomposition of the lexicon building process, into four discrete and sequential steps:

(a) Transliteration code points from Unicode.
(b) Orthographic standardization rules.
(c) Application of grapheme to phoneme rules.
(d) Application of phonological rules.

In following these steps one should gain accessibility to most of the existing speech/language processing tools, thereby internationalizing one's speech technology. In addition, adhering to this decomposition should allow for a reduction of rule conflicts that often plague the phoneticizing process.

Our work makes two main contributions: it proposes a systematic procedure for the internationalization of automatic speech recognition (ASR) systems. It also proposes a particular decomposition of the phoneticization process that facilitates internationalization by non-expert informants.

History

Date

2005-08-01

Usage metrics

Keywords

computer sciences Information and Computing Sciences not elsewhere classified

Licence

In Copyright

Internationalizing Speech Technology through Language Independent Lexical Acquisition

History

Date

Usage metrics

Categories

Keywords

Licence

Exports