Application of the Tightness Continuum Measure to Chinese Information Retrieval

Ying Xu, Randy Goebel, Christoph Ringlstetter, Grzegorz Kondrak
2010 Workshop on Multiword Expressions  
Most word segmentation methods employed in Chinese Information Retrieval systems are based on a static dictionary or a model trained against a manually segmented corpus. These general segmentation approaches may not be optimal because they disregard information within semantic units. We propose a novel method for improving word-based Chinese IR, which performs segmentation according to the tightness of phrases. In order to evaluate the effectiveness of our method, we employ a new test
more » ... of 203 queries, which include a broad distribution of phrases with different tightness values. The results of our experiments indicate that our method improves IR performance as compared with a general word segmentation approach. The experiments also demonstrate the need for the development of better evaluation corpora.
dblp:conf/mwe/XuGRK10 fatcat:alvvqoborfes3jzkhxsaarj7ai