P. Bollmann, F. Jochum, U. Reiner, V. Weissmann, H. Zuse
1985 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '85  
Extended Abstract The LIVE-project (Leistungsbewertung yon Infornmtion Retrieval Verfal~_ren) at the Technische Unlversitaet Berlin, West Germany IS concerned with the evaluation of Information retrieval systems. Two fields are mainly under investigation. One area IS about the investigation of methodological foundations of retrieval experiments. There are many authors /1/ who state, that there are still many problems to be solved. A summary on these problems can be found in/2/. Results of the
more » ... /. Results of the LIVE-project In this area can be seen on three different areas: on the one hand measurement-theoretical criteria for the application of sLmflarity and evaluation measures in retrisval experiments have been considered and developed. Some of the results can be found in/3/,/4/,/5/. Further work has been done in the application of statistical principles of experimental design in information retrieval. Espec~lly control structures of factors in retx~eval tests have been lnveetigated and some aspects of statistical models for experimentation in Information retwleval. Some of these results can be found in/6/,/7/,/6/,/9/. The other topic in the LIVE-project --which is the main content of this paper --ls the conduction of a retrlsval experiment in co-operation with FIZ4 (FachIn-formationsZentrum 4) which IS an Information service center for mathematics, physics and energy in Karlsruhe, West Germany. Amongst the many databases which FIZ4 is offering the LIVE-project uses the database about physics in their retrieval experlrnent. FIZ4 IS using the Information retrieval system GRIPS • (General Relation based Information l~oceeeing System) which was developed by DIMDI (Deutsches Institut flLer Medlzinische Dokumentstion und Infornmtion). The query language of GRIPS IS an extended Boolean language / 10/. Besides the operators 'and', 'or' and' not' the GI~PS retrieval language contains thesaurus -operators to extend the query and truncation --and context-operators for freetext and Boolean searching. To a given query GRIPS partitions the document collection into two sets: the retrieved and the not retrieved documents. To the user the retrieved documents are presented in the reversed order of their registration into the database. Under the assumption that this temporal order is not correlated with relevance the set of retrieved documents IS considered as an unordered set. In the LIVE-project we did not use the usual evaluation measures such as recall-precision-graph because the measurement values are difficult to Interpret. Instead in co-operation with FIZ4 evaluation viewpoints were defined. An example of an evaluation viewpoint is: 'a user wants exactly five relevant documents and wants to get as few nonrelevant documents as possible. A retrieval result R1 IS better than another one 1~, If the user gets fewer non.relevant documents with R1 than with 1~'. For this evaluation viewpoint the 'expected search length' of Cooper/11/IS an appropriate evaluation measure. As in the viewpoint defined above only the order of preference of retrieval results IS defined, the measure may only be used as an ordinal scale. To use it as an interval scale the assumption ls made, that every additional nonrelevant document which the user retrieves causes the same additional amount of unproductive labour. In a similar way several other viewpoints and evaluation measures were defined and applied in the retrieval experiment. Under the assumption that the evaluation measure IS an interval scale averaging IS done by calculating the arithmetic mean. As levels of the experimental factor/8/ the following similarity measures were used: Inner product measure, cosine measure, overlap measure, coefficient of daccard and Euclidean distance. As sltuative factors /8/ the number of documents retrieved by GRIPS, the number of descriptiors of the queries, generality and topic of documents were used. The retrieval experiment Is not yet finished completely but several results have already been obtained. For example in the average (over 81 queries) for the above defined viewpoint the ranking with the inner product measure does not indicate a significant improvement compared with the GRIPS-output. In the case of the Euclidean distance measure it seems that in the average the user has to Inspect less nonrelevant documents. This means an improvement compared with the unordered retrieved set from the GRIPS-output. For more details of the so called 'two-level retrieval process' and further experimental results we refer to the long version of this paper. 213
