Single n-gram stemming

James Mayfield, Paul McNamee
2003 Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03  
Stemming can improve retrieval accuracy, but stemmers are language-specific. Character n-gram tokenization achieves many of the benefits of stemming in a language independent way, but its use incurs a performance penalty. We demonstrate that selection of a single n-gram as a pseudo-stem for a word can be an effective and efficient language-neutral approach for some languages.
doi:10.1145/860435.860528 dblp:conf/sigir/MayfieldM03 fatcat:kimj6rjgwbajhl5gk4aovy54aq