Filters








654 Hits in 9.7 sec

Is CORI Effective for Collection Selection? An Exploration of Parameters, Queries, and Data

Daryl J. D'Souza, Justin Zobel, James A. Thom
2004 Australasian Document Computing Symposium  
We have explored the behaviour of CORI for a range of data sets and parameter values.  ...  Coupled with the observation that even CORI with optimal parameters is usually less effective than other methods, we conclude that the use of CORI as a benchmark collection selection method is inappropriate  ...  In an early CORI paper, values for these parameters were chosen by exploration on a particular collection (Callan et al. 1995) .  ... 
dblp:conf/adcs/DSouzaZT04 fatcat:3c3g542fi5hozehmosem4suvmq

Improving Shard Selection for Selective Search [chapter]

Mon Shih Chuang, Anagha Kulkarni
2017 Lecture Notes in Computer Science  
We thus investigate three new approaches for the shard ranking problem, and three techniques to estimate how many of the top shards should be searched for a query (shard rank cutoff estimation).  ...  The ability to identify the relevant shards for the query, directly impacts Selective Search performance.  ...  For both, PK2 and PK3, the number of data points (M) over which the mean and the standard deviation are computed is a tunable parameter.  ... 
doi:10.1007/978-3-319-70145-5_3 fatcat:vuwlyidtknho5c66p24rkzvxgi

Collection selection for managed distributed document databases

Daryl D'Souza, James A. Thom, Justin Zobel
2004 Information Processing & Management  
A method for choosing collections that has been widely investigated is the use of a selection index, which captures broad information about each collection and its documents.  ...  In a distributed document database system, a query is processed by passing it to a set of individual collections and collating the responses.  ...  However, since CORI has been widely reported as an effective collection-selection method, its poor performance in these experiments is deeply surprising.  ... 
doi:10.1016/s0306-4573(03)00008-6 fatcat:ygdpyr7r7rfs5is7vdr4otoq6e

Server selection methods in personal metasearch: a comparative empirical study

Paul Thomas, David Hawking
2009 Information retrieval (Boston)  
Server selection is an important subproblem in distributed information retrieval (DIR) but has commonly been studied with collections of more or less uniform size and with more or less homogeneous content  ...  We then explore the effect of collection size variations using four partitionings of the TREC ad hoc data used in many other DIR experiments.  ...  Acknowledgements We thank the anonymous reviewers for their helpful comments. References  ... 
doi:10.1007/s10791-009-9094-z fatcat:siw5wbhexne73nwgqbrol2ii5q

Relevant document distribution estimation method for resource selection

Luo Si, Jamie Callan
2003 Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03  
Prior research under a variety of conditions has shown the CORI algorithm to be one of the most effective resource selection algorithms, but the range of database sizes studied was not large.  ...  We also show how to acquire database size estimates in uncooperative environments as an extension of the query-based sampling used to acquire resource descriptions.  ...  Any opinions, findings, conclusions, or recommendations expressed in this paper are the authors', and do not necessarily reflect those of the sponsor.  ... 
doi:10.1145/860435.860490 dblp:conf/sigir/SiC03 fatcat:yr36u4724fdhpmbmdks7rde7qq

Relevant document distribution estimation method for resource selection

Luo Si, Jamie Callan
2003 Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03  
Prior research under a variety of conditions has shown the CORI algorithm to be one of the most effective resource selection algorithms, but the range of database sizes studied was not large.  ...  We also show how to acquire database size estimates in uncooperative environments as an extension of the query-based sampling used to acquire resource descriptions.  ...  Any opinions, findings, conclusions, or recommendations expressed in this paper are the authors', and do not necessarily reflect those of the sponsor.  ... 
doi:10.1145/860488.860490 fatcat:4qj4uenizrfdbnsavj7vyp23yu

Report on the CLEF-IP 2013 Experiments: Multilayer Collection Selection on Topically Organized Patents

Anastasia Giachanou, Michail Salampasis, Maya Satratzemi, Nikolaos Samaras
2013 Conference and Labs of the Evaluation Forum  
For source selection, we tested CORI and a new collection selection method, the Multilayer method. We also tested CORI and SSL results merging algorithms.  ...  We run experiments using different combinations of the number of collections requested and documents retrieved from each collection.  ...  We plan to explore further this line of work with exploring modifications to the Multilayer and to make it more effective for patent search.  ... 
dblp:conf/clef/GiachanouSSS13 fatcat:4f3tfeosczh4tmvpkmrpg5ddhi

Two-stage statistical language models for text database selection

Hui Yang, Minjie Zhang
2006 Information retrieval (Boston)  
As the number and diversity of distributed Web databases on the Internet exponentially increase, it is difficult for user to know which databases are appropriate to search.  ...  Experimental results demonstrate that such a language modeling approach is competitive with current state-of-the-art database selection approaches.  ...  We thank Peter Eklund for his help and advices in the written expression of this paper.  ... 
doi:10.1007/s10791-005-5719-z fatcat:7uy6vfi7frdlbagtutgcl5msk4

Resource Selection for Federated Search on the Web [article]

Dong Nguyen, Thomas Demeester, Dolf Trieschnigg, Djoerd Hiemstra
2016 arXiv   pre-print
Third, we provide an empirical comparison of several popular resource selection methods and find that these methods are not readily suitable for resource selection on the web.  ...  Challenges include the sparse resource descriptions and extremely skewed sizes of collections.  ...  Acknowledgements This research was supported by the Dutch national program COMMIT and the Folktales as Classifiable Texts (FACT) project, which is part of the CATCH programme funded by the Netherlands  ... 
arXiv:1609.04556v1 fatcat:47p6m6xwqfa7pkg4nus5tu542a

Classification-aware hidden-web text database selection

Panagiotis G. Ipeirotis, Luis Gravano
2008 ACM Transactions on Information Systems  
An important step in the metasearching process is database selection, or determining which databases are the most relevant for a given user query.  ...  The second algorithm uses "shrinkage," a statistical technique for improving parameter estimation in the face of sparse data, to enhance the database content summaries with category-specific words.  ...  Their main conclusion is that CORI is robust and performs better than other database selection algorithms for a variety of data sets.  ... 
doi:10.1145/1344411.1344412 fatcat:vb4ea42nn5bjzfsgecbpo5otjy

Efficient distributed selective search

Yubin Kim, Jamie Callan, J. Shane Culpepper, Alistair Moffat
2016 Information retrieval (Boston)  
are used; measuring the effect of two policies for assigning index shards to machines; and exploring the benefits of index-spreading and mirroring as the number of deployed machines is varied.  ...  By partitioning the collection into small topical shards, and then using a resource ranking algorithm to choose a subset of shards to search for each query, fewer postings are evaluated.  ...  Shane Culpepper is the recipient of an Australian Research Council DECRA Research Fellowship (DE140100275).  ... 
doi:10.1007/s10791-016-9290-6 fatcat:3rl5hm5rivbcbjgjznzkzcff7m

Agreement Based Source Selection for the Multi-Domain Deep Web Integration

Manishkumar Jha, Raju Balakrishnan, Subbarao Kambhampati
2011 International Conference on Management of Data  
For open collections like the deep web, the source selection must be sensitive to trustworthiness and importance of sources.  ...  One immediate challenge in searching the deep web databases is source selection-i.e. selecting the most relevant web databases for answering a given query.  ...  Query Similarity Based Measures CORI: CORI is a query-based relevance measure. Source statistics for CORI were collected using highest document frequency terms from the sample crawl data.  ... 
dblp:conf/comad/JhaBK11 fatcat:75qceeszrzgrtpxqlxzdxyjcwe

Search result diversification in resource selection for federated search

Dzung Hong, Luo Si
2013 Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13  
Prior research in resource selection for federated search mainly focused on selecting a small number of information sources that are most relevant to a user query.  ...  Both approaches can be applied with a wide range of existing resource selection algorithms such as ReDDE, CRCS, CORI and Big Document.  ...  CONCLUSION AND FUTURE WORK Resource selection is an important research problem in federated search for selecting a small number of relevance sources for a given query.  ... 
doi:10.1145/2484028.2484091 dblp:conf/sigir/HongS13 fatcat:h2fb6wckznaanbfpihshke7hpe

Load-Balancing and Caching for Collection Selection Architectures

Diego Puppin, Fabrizio Silvestri, Raffaele Perego, Ricardo Baeza-Yates
2007 Proceedings of the 2nd International ICST Conference on Scalable Information Systems  
In this paper, we analyze the relationship between the collection selection strategy, the effect on load balancing and on the caching subsystem, by exploring the design-space of a distributed search engine  ...  In particular, we propose a strategy to perform collection selection in a load-driven way, and a novel caching policy able to incrementally refine the effectiveness of the results returned for each subsequent  ...  ReDDE, presented in [25] , is an improvement over CORI for collections that are uncooperative or of different sizes.  ... 
doi:10.4108/infoscale.2007.892 dblp:conf/infoscale/PuppinSPB07 fatcat:grrj6d6zpjhktoiu3z73b33zsi

ShRkC: Shard Rank Cutoff Prediction for Selective Search [chapter]

Anagha Kulkarni
2015 Lecture Notes in Computer Science  
However, a related important task of identifying how many of the top ranked relevant shards should be searched for the query, so as to balance the competing objectives of effectiveness and efficiency,  ...  The central premise for the proposed solution is that the number of top shards searched should be dependent on -1. the query, 2. the given ranking of shards, and 3. on the type of search need being served  ...  Section 6 explores the effects of parameter tuning on search performance.  ... 
doi:10.1007/978-3-319-23826-5_32 fatcat:pvvzgtobj5el7m7qrkdzoidcwi
« Previous Showing results 1 — 15 out of 654 results