A Fusion Approach to XML Structured Document Retrieval

Ray R. Larson
2005 Information retrieval (Boston)  
Introduction XML has emerged as a lingua franca of the WWW and is rapidly replacing other formats as the preferred form for information ranging from protocol exchange messages to full documents and databases. With this rapid growth, and the conversion of information resources to XML, comes an increasing need for effective search and retrieval of XML documents and their constituent elements. The XML retrieval problem (as formulated for the Initiative for the Evaluation of XML Retrieval or INEX)
more » ... Fuhr et al., 2002) is to retrieve not only complete documents, but also the component parts of those documents that may contain relevant information. Thus, an effective retrieval system for XML retrieval must deal with retrieval and ranking of both full documents and components derived from the document structure. In this research, and in the Cheshire II system used for the research, we define a document component, or simply component, as a continuous segment of an XML document representing some part of an XML document tree structure, and comprised of one or more XML document elements (i.e., spans of data consisting of a begin tag, and ending with the corresponding end tag). In the research reported here, we examine the application of data fusion methods to the XML retrieval problem. The basic notion of "data fusion" or "meta-search" approaches to IR is quite simple and intuitively appealing. Early observations by researchers examining different algorithms and query combination methods (Croft, 2000; Shaw and Fox, 1994; Belkin et al., 1995) indicated that no single retrieval algorithm could be shown to be consistently better than any other algorithm for all types of searches, and therefore some combination of different search strategies should be more effective than any single strategy.
doi:10.1007/s10791-005-0749-0 fatcat:jrs55vnjzrgpdmybzzab5mr5ia