Carnegie Mellon University
Browse

Machine Translation for Human Translators

Download (2.11 MB)
thesis
posted on 2022-12-13, 21:29 authored by Michael Denkowski

While machine translation is sometimes sufficient for conveying information across language barriers, many scenarios still require precise human-quality translation that MT is currently unable to deliver. Governments and international organizations such as the United Nations require accurate translations of content dealing with complex geopolitical issues. Community-driven projects such as Wikipedia rely on volunteer translators to bring accurate information to diverse language communities. As the amount of data requiring translation has continued to increase, the idea of using machine translation to improve the speed of human translation has gained significant traction. In the frequently employed practice of post-editing, a MT system outputs an initial translation and a human translator edits it for correctness, ideally saving time over translating from scratch. While general improvements in MT quality have led to productivity gains with this technique, the idea of designing translation systems specifically for post-editing has only recently caught on in research and commercial communities. 

In this work, we present extensions to key components of statistical machine translation systems aimed directly at reducing the amount of work required from human translators. We cast MT for post-editing as an online learning task where new training instances are created as humans edit system output and introduce an adaptive MT system that immediately learns from this human feedback. New translation rules are learned from the data and both feature scores and weights are updated after each sentence is post-edited. An extended feature set allows making fine-grained distinctions between background and post-editing data on a pertranslation basis. We describe a simulated post-editing paradigm wherein existing reference translations are used as a stand-in for human editing during system tuning, allowing our adaptive systems to be built and deployed without any seed post-editing data. 

We present a highly tunable automatic evaluation metric that scores hypothesis-reference pairs according to several statistics that are directly interpretable as measures of post-editing effort. Once an adaptive system is deployed and sufficient post-editing data is collected, our metric can be tuned to fit editing effort for a specific translation task. This version of the metric can then be plugged back into the translation system for further optimization. 

To both evaluate the impact of our techniques and collect post-editing data to refine our systems, we present a web-based post-editing interface that connects human translators to our adaptive systems and automatically collects several types of highly accurate data while they work. In a series of simulated and live post-editing experiments, we show that while many of our presented techniques yield significant improvement on their own, the true potential of adaptive MT is realized when all techniques are combined. Translation systems that update both the translation grammar and weight vector after each sentence is post-edited yield super-additive gains over baseline systems across languages and domains, including low resource scenarios. Optimizing systems toward custom, task-specific metrics further boosts performance. Compared to static baselines, our adaptive MT systems produce translations that require less mechanical effort to correct and are preferred by human translators. Every software component developed as part of this work is made publicly available under an open source license. 

History

Date

2015-05-08

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Alon Lavie

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC