posted on 2010-02-01, 00:00authored byVioletta Cavalli-Sforza, Jaime G. Carbonell, Peter J Jansen
We describe ongoing efforts towards developing language resources for a transnational digital government project aimed at applying
information technology (IT) to a problem of international concern: detecting and monitoring activities related to the transnational
movement of illicit drugs. The project seeks to support information sharing, coordination and collaboration among government
agencies within a country and across national boundaries by combining a variety of technologies including a distributed query
processor with form-based and conversational user interfaces, a language translation system, an event server for event filtering and
notification, and an event-trigger-rule server. The prototype system is being developed by U.S. universities in collaboration with an
international agency and with universities and government agencies in Belize and the Dominican Republic. This paper focuses on the
linguistic resources and their use in Example-Based Machine Translation (EBMT). We are in the process of developing an English-
Spanish parallel corpus, focused on the domain of information elicited and used at border crossings, to fuel the EBMT system. While
significant parallel corpora are available for these two languages in the newswire domain, they were found to be of very limited use for
the border crossings application, spurring the need to develop our own resources.