Carnegie Mellon University
Browse
Bendou, Imane.pdf (1.94 MB)

Automatic Arabic Translation of English Educational Content Online using Neural Machine Translation: the Case of Khan Academy

Download (1.94 MB)
thesis
posted on 2021-10-01, 20:57 authored by Imane Bendou
Massive Open Online Courses (MOOCs) offer valuable and high quality learning
opportunities and educational content in several disciplines to many students, to a large extent
regardless of their background, location, and personal circumstances. However, language represents
a major barrier for them, keeping non-native English speakers from benefiting from these online
educational resources, since online content most available is in English. Given there are over 300
schools in Qatar covering all topics in Arabic, in order to make online educational resources more
available to students in them, we designed and implemented an automatic machine translation
solution based on deep learning techniques. It aims to make high-quality Arabic translations of
subtitles available in English. We focused on the case of Khan Academy which provides a
personalized learning experience that is mainly focused on videos. These videos have subtitles that
are generally generated by volunteers for different languages. Our system covers several subjects
ranging from Physics and Mathematics to Programming and Arts and Humanities, with a focus on
high school level students. Our system was trained using a high-quality parallel corpus from the
education domain developed by the Qatar Computing Research Institute (QCRI). Furthermore, the
system underwent intrinsic evaluation by comparing its output to a high-quality reference
translation, as well as extrinsic evaluation in a pilot study, where we aimed at testing the quality of
the system’s output in schools to evaluate its contribution to student understanding.

History

Date

2021-05-04

Advisor(s)

Houda Bouamor

Department

  • Information Systems

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC