Evaluating semantic indexing techniques through cross-language fingerprinting

Eduard Hoenkamp, Sander van Dijk
2005 Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '05  
Users in search of on-line document sources are usually looking for content, not words. Hence, IR researchers generally agree that search techniques should be geared toward the meaning underlying documents rather than toward the text itself. The most visible examples of such techniques are Latent Semantic Analysis (LSA), and the Hyperspace Analog to Language (HAL). If these techniques really uncover semantic dependencies, then they should be applicable across languages. We investigated this
more » ... g electronic versions of three kinds of translated material: a novel, a popular treatise about cosmology, and a data base of technical specifications. We used the analogy of fingerprinting used in forensics to establish if individuals are related. Genetic fingerprinting uses enzymes to split the DNA and then compare the resulting band patterns. Likewise, in our research we use queries to split a document into fragments. If a search technique really isolates fragments related to the query, then a document and its translation should have similar band patterns. In this paper we (1) present the fingerprinting technique, (2) introduce the material used, and (3) report preliminary results of an evaluation for two semantic indexing techniques.
doi:10.1145/1076034.1076152 dblp:conf/sigir/HoenkampD05 fatcat:rqbtqn7kcrfnpl5frxa27r7kmm