MorphoSaurus in ImageCLEF 2006: The Effect of Subwords On Biomedical IR [chapter]

Philipp Daumke, Jan Paetzold, Kornel Marko
2007 Lecture Notes in Computer Science  
We here describe the subword approach we used in the 2006 ImageCLEF Medical Image Retrieval task. It is based on the assupmtion that neither fully inflected nor automatically stemmed words constitute the appropriate granularity for lexicalized content description. We therefore introduce subwords as morphologically meaningful word units. Subwords are organized in language specific lexica that were partly manually and partly automatically generated and currently cover six European languages. They
more » ... are linked together via a multilingual thesaurus. The use of subwords instead of full words significantly reduces the number of lexical entries that are needed to sufficiently cover a specific language and domain. A further benefit of the approach is its independence from the underlying retrieval system, thus making it usable by any search engine. In this year's test runs we combined MorphoSaurus with the open-source search engine Lucene and achieved precision gains of up to 25% over the baseline for a monolingual setting and promising results in a multilingual scenario.
doi:10.1007/978-3-540-74999-8_80 fatcat:dndb6uhz2revficb6xgeaacqju