Language-Dependent and Language-Independent Approaches to Cross-Lingual Text Retrieval [chapter]

Jaap Kamps, Christof Monz, Maarten de Rijke, Börkur Sigurbjörnsson
2004 Lecture Notes in Computer Science  
We investigates the effectiveness of language-dependent approaches to document retrieval, such as stemming and decompounding, and constrast them with language-independent approaches, such as character n-gramming. In order to reap the benefits of more than one type of approach, we also consider the effectiveness of the combination of both types of approaches. We focus on document retrieval in nine European languages: Dutch, English, Finnish, French, German, Italian, Russian, Spanish, and
more » ... We look at four different cross-lingual information retrieval tasks: monolingual, bilingual, multilingual, and domain-specific retrieval. The experimental evidence is obtained using the 2003 test suite of the cross-language evaluation forum (CLEF). System Description Retrieval Approach. All retrieval runs used FlexIR, an information retrieval system developed at the University of Amsterdam [5] . The main goal underlying FlexIR's design is to facilitate flexible experimentation with a wide variety of retrieval components and techniques. FlexIR is implemented in Perl and supports many types of preprocessing, scoring, indexing, and retrieval tools. Retrieval Model. FlexIR supports several retrieval models, including the standard vector space model, language models, and probabilistic models. All runs
doi:10.1007/978-3-540-30222-3_14 fatcat:cte52lxclndylkv7hk7g2mtvai