Statistical Morphological Disambiguation for Agglutinative Languages

Hakkani-Tur, Dilek Zeynep; Oflazer, Kemal; Tur, Gokhan

doi:10.1184/R1/6368963.v1

C00-1042.pdf (224.66 kB)

Statistical Morphological Disambiguation for Agglutinative Languages

journal contribution

posted on 2000-08-01, 00:00 authored by Dilek Zeynep Hakkani-Tur, Kemal OflazerKemal Oflazer, Gokhan Tur

In this paper, we present statistical models for morphological disambiguation in Turkish. Turkish presents an interesting problem for statistical models since the potential tag set size is very large because of the productive derivational morphology. We propose to handle this by breaking up the morphosyntactic tags into inflectional groups, each of which contains the inflectional features for each (intermediate) derived form. Our statistical models score the probability of each morphosyntactic tag by considering statistics over the individual inflection groups in a trigram model. Among the three models that we have developed and tested, the simplest model ignoring the local morphotactics within words performs the best. Our best trigram model performs with 93.95% accuracy on our test data getting all the morphosyntactic and semantic features correct. If we are just interested in syntactically relevant features and ignore a very small set of semantic features, then the accuracy increases to 95.07%.

History

Publisher Statement

Published in Proceedings of The 18th International Conference on Computational Linguistics, August 2000, Saarbrucken, Germany

Date

2000-08-01

Usage metrics

Keywords

Morphological Disambiguation Turkish

Licence

CC BY-NC-SA 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Statistical Morphological Disambiguation for Agglutinative Languages

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports