This paper describes and evaluates a modification
to the segmentation model used in the unsupervised
morphology induction system, ParaMor.
Our improved segmentation model
permits multiple morpheme boundaries in a
single word. To prepare ParaMor to effectively
apply the new agglutinative segmentation
model, two heuristics improve ParaMor’s precision.
These precision-enhancing heuristics
are adaptations of those used in other unsupervised
morphology induction systems, including
work by Hafer and Weiss (1974) and Goldsmith
(2006). By reformulating the segmentation
model used in ParaMor, we significantly
improve ParaMor’s performance in all language
tracks and in both the linguistic evaluation
as well as in the task based information retrieval
(IR) evaluation of the peer operated
competition Morpho Challenge 2007. Para-
Mor’s improved morpheme recall in the linguistic
evaluations of German, Finnish, and
Turkish is higher than that of any system which
competed in the Challenge. In the three languages
of the IR evaluation, our enhanced ParaMor
significantly outperforms, at average
precision over newswire queries, a morphologically
naïve baseline; scoring just behind the
leading system from Morpho Challenge 2007
in English and ahead of the first place system
in German