Scalable Multilingual Information Access [chapter]

Paul McNamee, James Mayfield
2003 Lecture Notes in Computer Science  
The third Cross-Language Evaluation Forum workshop (CLEF-2002) provides the unprecedented opportunity to evaluate retrieval in eight different languages using a uniform set of topics and assessment methodology. This year the Johns Hopkins University Applied Physics Laboratory participated in the monolingual, bilingual, and multilingual retrieval tasks. We contend that information access in a plethora of languages requires approaches that are inexpensive in developer and run-time costs. In this
more » ... aper we describe a simplified approach that seems suitable for retrieval in many languages; we also show how good retrieval is possible over many languages, even when translation resources are scarce, or when query-time translation is infeasible. In particular, we investigate the use of character n-grams for monolingual retrieval, pre-translation expansion as a technique to mitigate errors due to limited translation resources, and translation of document representations to an interlingua for computationally efficient retrieval against multiple languages.
doi:10.1007/978-3-540-45237-9_17 fatcat:6wihlivr3jf4zkcx4u3puphjey