posted on 2007-05-01, 00:00authored byJaime G. Carbonell, Yiming Yang, Robert Frederking, Ralf D Brown, Yibing Geng, Danny Lee
Translingual information retrieval (TIR) consists of providing a query in one language and
searching document collections in one or more
different languages. This paper introduces new
TIR methods and reports on comparative TIR
experiments with these new methods and with
previously reported ones in a realistic setting.
Methods fall into two categories, query translation based, and statistical-IR approaches establishing translingual associations. The results show that using bilingual corpora for automated extraction of term equivalences in context outperforms other methods. Translingual versions of the Generalized Vector Space
Model (GVSM) and Latent Semantic Indexing
(LSI) perform relatively well, as does translingual pseudo relevance feedback (PRF). All
showed relatively small performance loss between monolingual and translingual versions.
Query translation based on a general machine-
readable bilingual dictionary heretofore the
most popular method did not match the performance of other, more sophisticated methods.
Also, the previous very high LSI results in the
literature were disconfirmed by more realistic
relevance-based evaluations.