A Poor Man's Approach to CLEF [chapter]

Arjen P. de Vries
2001 Lecture Notes in Computer Science  
The Mirror DBMS [dV99] aims specifically at supporting both data management and content management in a single system. Its design separates the retrieval model from the specific techniques used for implementation, thus allowing more flexibility to experiment with a variety of retrieval models. Its design based on database techniques intends to support this flexibility without causing a major penalty on the efficiency and scalability of the system. The support for information retrieval in our
more » ... tem is presented in detail in [dVH99], [dV98], and [dVW99]. The primary goal of our participation in CLEF is to acquire experience with supporting Dutch users. Also, we want to investigate whether we can obtain a reasonable performance without requiring expensive (but high quality) resources. We do not expect to obtain impressive results with our system, but hope to obtain a baseline from which we can develop our system further. We decided to submit runs for all four target languages, but our main interest is in the bilingual Dutch to English runs. Pre-processing We have used only 'off-the-shelf' tools for stopping, stemming, compound-splitting (only for Dutch) and translation. All our tools are available for free, without usage restrictions for research purposes. Stopping and stemming Moderately sized stoplists, of comparable coverage, were made available by University of Twente (see also Table 1) . We used the stemmers provided by Muscat 1 , an open source search engine. The Muscat software includes stemmers for all five languages, as well as Spanish and Portuguese. The stemming algorithms are based on the Porter stemmer.
doi:10.1007/3-540-44645-1_14 fatcat:d46zs7qc25dadknlrsrfyjqxca