NTCIR-3 Chinese, Cross Language Retrieval Experiments Using PIRCS

K. L. Kwok
2002 NTCIR Conference on Evaluation of Information Access Technologies  
We participated in the monolingual Chinese, English-Chinese cross language and multilingual retrieval tasks using our PIRCS retrieval system. For monolingual, bigram and short-word indexing (both with single characters) were employed for representation. Two separate retrieval lists were obtained and later combined as final result for some submissions. For cross-lingual and multilingual retrieval, only short-word indexing was used. We performed retrieval with two types of queries: queries from
more » ... l sections of a topic, and from the description section only. The best monolingual mean average precision based on relax assessment is ~0.41 for long queries and ~0.36 for short description-only queries. These values are much less than those for NTCIR-2 and may indicate that NTCIR-3 environment is more difficult. For cross-lingual, we employed the query translation approach and concatenated outputs from MT-software and dictionary translation into one Chinese query. Results were also much inferior to those observed in NTCIR-2, achieving only about 56% of monolingual for long and 44% for short queries using relaxed judgment. Post-judgment experiments show that monolingual retrieval can be improved for short-word indexing by employing a corpus-specific segmentation dictionary derived from the corpus itself. For cross-lingual retrieval, bigram indexing should also have been used to combine with short-word indexing. This can improve comparisons of cross language result with monolingual to 69% for long and 52% for short queries respectively.
dblp:conf/ntcir/Kwok02 fatcat:dtoaohfoffdt5fn3ebnbaovtki