Automated Corpus Analysis and the Acquisition of Large, Multi-Lingual Knowledge Bases for MT

Mitamura, Teruko; Nyberg, Eric; Carbonell, Jaime G.

doi:10.1184/R1/6603704.v1

file.pdf (281.27 kB)

Automated Corpus Analysis and the Acquisition of Large, Multi-Lingual Knowledge Bases for MT

journal contribution

posted on 2008-04-01, 00:00 authored by Teruko Mitamura, Eric Nyberg, Jaime G. Carbonell

Although knowledge-based MT systems have the potential to achieve high translation accuracy, each successful application system requires a large amount of hand-coded knowledge (lexicons, grammars, mapping rules, etc.). Systems like KBMT-89 and its descendants have demonstrated how knowledge-based translation can produce good results in technical domains with tractable domain semantics. Nevertheless, the cost of developing large-scale applications with tens of thousands of domain concepts precludes a purely hand-crafted approach. The current challenge for the "next generation" of knowledge-based MT systems is to utilize on-line textual resources and corpus analysis software in order to automate the most laborious aspects of the knowledge acquisition process. This partial automation can in turn maximize the productivity of human knowledge engineers and help to make large-scale applications of knowledge-based MT an economic reality. In this paper we discuss the corpus-based knowledge acquisition methodology used in KANT, a knowledge-based translation system for multi-lingual document production. This methodology can be generalized beyond the KANT interlingua approach for use with any system that requires similar kinds of knowledge.

History

Publisher Statement

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Date

2008-04-01

Usage metrics

Keywords

computer sciences

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Automated Corpus Analysis and the Acquisition of Large, Multi-Lingual Knowledge Bases for MT

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports