Integrating Morphology with Multi-word Expression Processing in Turkish

Oflazer, Kemal; Cetinoglu, Ozlem; Say, Bilge

doi:10.1184/R1/6377405.v1

Integrating Morphology with Multi-word Expression Processing in Turkish

journal contribution

posted on 2004-07-01, 00:00 authored by Kemal OflazerKemal Oflazer, Ozlem Cetinoglu, Bilge Say

This paper describes a multi-word expression processor for preprocessing Turkish text for various language engineering applications. In addition to the fairly standard set of lexicalized collocations and multi-word expressions such as named-entities, Turkish uses a quite wide range of semi-lexicalized and non-lexicalized collocations. After an overview of relevant aspects of Turkish, we present a description of the multi-word expressions we handle. We then summarize the computational setting in which we employ a series of components for tokenization, morphological analysis, and multi-word expression extraction. We finally present results from runs over a large corpus and a small gold-standard corpus.