Applying Multiple Characteristics and Techniques in the NICT Information Retrieval System at NTCIR-6

Masaki Murata, Jong-Hoon Oh, Qing Ma, Hitoshi Isahara
2007 NTCIR Conference on Evaluation of Information Access Technologies  
Our information retrieval system takes advantage of numerous characteristics of information and uses numerous sophisticated techniques. It uses Robertson's 2-Poisson model and Rocchio's formula, both of which are known to be effective. Characteristics of newspapers such as locational information are used. We present our application of Fujita's method, where longer terms are used in retrieval by the system but de-emphasized relative to the emphasis on the shortest terms. This allows us to use
more » ... h compound and single-word terms. The statistical test used in expanding queries through an automatic feedback process is described. The method gives us terms that have been statistically shown to be related to the top-ranked documents obtained in the first retrieval. We also use a numerical term, QIDF, which is an IDF term for queries. QIDF decreases the scores for stop words that occur in many queries. It can be very useful for foreign languages for which we cannot determine stop words. We also use web-based unknown word translation for bilingual information retrieval. We participated in two monolingual information retrieval tasks (Korean and Japanese) and five bilingual information retrieval tasks (Chinese-Japanese, English-Japanese, Japanese-Korean, Korean-Japanese, and English-Korean) at NTCIR-6. We obtained good results in all the tasks.
dblp:conf/ntcir/MurataOMI07 fatcat:i3mx247wrnfwhkbbqa3vcfkkvy