Single n-gram stemming

James Mayfield, Paul McNamee
2003 Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03  
Stemming can improve retrieval accuracy, but stemmers are language-specific. Character n-gram tokenization achieves many of the benefits of stemming in a language independent way, but its use incurs a performance penalty. We demonstrate that selection of a single n-gram as a pseudo-stem for a word can be an effective and efficient language-neutral approach for some languages.
doi:10.1145/860500.860528 fatcat:nyqybzoftrevpmadj3s23hd53a