A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2008; you can also visit the original URL.
The file type is application/pdf
.
Comparing representations in Chinese information retrieval
1997
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '97
Three representation methods are empirically investigated for Chinese information retrieval: 1-gram (single character), bigram (two contiguous overlapping characters), and short-word indexing based on a simple segmentation of the text. The retrieval collection is the approximately 170 MB TREC-5 Chinese corpus of news articles, and 28 queries that are long and rich in wordings. Evaluation shows that 1-gram indexing is good but not sufficiently competitive, while bigram indexing works
doi:10.1145/258525.258531
dblp:conf/sigir/Kwok97
fatcat:g44yl7qpz5hphjpwalrnthquxm