Retrieval Experiments at Morpho Challenge 2008

Paul McNamee
2008 Conference and Labs of the Evaluation Forum  
Morpho Challenge 2008 hosted an extrinsic evaluation of morphological analysis that explored whether unsupervised morphology induction could benefit information retrieval. This paper presents results in alternative methods for word normalization using test sets from the Cross-Language Evaluation Forum (CLEF) ad-hoc collections. Preliminary results for the Morpho Challenge 2008 evaluation are consistent with these data. We found that: (1) rule-based stemming is effective in less morphologically
more » ... omplicated languages; (2) alternative methods for stemming such as unsupervised learning of morphemes and least common n-gram stemming are helpful; and, (3) full character n-gram indexing is the most effective form of tokenization in more morphologically complex languages.
dblp:conf/clef/McNamee08b fatcat:u7rzgs2z7fd7pfdmh4x62mk55u