Carnegie Mellon University
Browse
file.pdf (257.6 kB)

Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages

Download (257.6 kB)
journal contribution
posted on 2009-08-01, 00:00 authored by Vamshi Ambati, Alon LavieAlon Lavie, Jaime G. Carbonell

We propose a generic rule induction framework that is informed by syntax from both sides of a parsed parallel corpus, as sets of structural, boundary and labeling related constraints. Factoring syntax in this manner empowers our framework to work with independent annotations coming from multiple resources and not necessarily a single syntactic structure. We then explore the issue of lexical coverage of translation models learned in different scenarios using syntax from one side vs. both sides. We specifically look at how the non-isomorphic nature of parse trees for the two languages affects coverage. We propose a novel technique for restructuring targetside parse trees, that generates alternate isomorphic target trees that preserve the syntactic boundaries of constituents that were aligned in the original parse trees. We also show that combining rules extracted by restructuring syntactic trees on both sides produces significantly better translation models. The improved precision and coverage of our syntax tables particularly fill in for the lack of lexical coverage in Syntax based Machine Translation approaches.

History

Publisher Statement

Copyright 2009 AMTA

Date

2009-08-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC