Applying Multiple Characteristics and Techniques to Obtain High Levels of Performance in Information Retrieval at NTCIR-4

Masaki Murata, Qing Ma, Hitoshi Isahara
2004 NTCIR Conference on Evaluation of Information Access Technologies  
Our information retrieval system takes advantage of numerous characteristics of the information and applies numerous sophisticated techniques. Robertson's 2-Poisson model and Rocchio's formula, both of which are known to be effective, have been applied in the system. Characteristics of newspapers such as locational information were applied. We present our application of Fujita's method, where longer terms are used in retrieval by the system but de-emphasized relative to the emphasis on the
more » ... est terms; this allows us to use both compound and single-word terms. The statistical test used in expanding queries through an automatic feedback process is described. The method gives us terms which have been statistically confirmed to be related to the top-ranked documents that were obtained in the first retrieval. We also used a numerical term QIDF, which is an IDF term for queries. It has a function to decrease the scores for stop words that occur in many queries. It can be very useful for foreign languages for which we cannot examine stop words. We participated in three tasks (Korean, Japanese, and English) of monolingual information retrieval at NTCIR 4. We obtained relatively higher precisions in all the tasks in which we participated. In particular, we obtained the best precision in Korean description-based monolingual information retrieval.
dblp:conf/ntcir/MurataMI04 fatcat:h5nrsla2w5arrh3bn2f2j2klje