Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages

Ambati, Vamshi; Lavie, Alon; Carbonell, Jaime G.

doi:10.1184/R1/6622217.v1

file.pdf (257.6 kB)

Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages

journal contribution

posted on 2009-08-01, 00:00 authored by Vamshi Ambati, Alon LavieAlon Lavie, Jaime G. Carbonell

We propose a generic rule induction framework that is informed by syntax from both sides of a parsed parallel corpus, as sets of structural, boundary and labeling related constraints. Factoring syntax in this manner empowers our framework to work with independent annotations coming from multiple resources and not necessarily a single syntactic structure. We then explore the issue of lexical coverage of translation models learned in different scenarios using syntax from one side vs. both sides. We specifically look at how the non-isomorphic nature of parse trees for the two languages affects coverage. We propose a novel technique for restructuring targetside parse trees, that generates alternate isomorphic target trees that preserve the syntactic boundaries of constituents that were aligned in the original parse trees. We also show that combining rules extracted by restructuring syntactic trees on both sides produces significantly better translation models. The improved precision and coverage of our syntax tables particularly fill in for the lack of lexical coverage in Syntax based Machine Translation approaches.

History

Publisher Statement

Date

2009-08-01

Usage metrics

Keywords

Software Research

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports