4 Hits in 2.1 sec

Distributed Multisearch and Resource Selection for the TREC Million Query Track

Christopher T. Fallen, Gregory B. Newby, Kylie McCormick
2008 Text Retrieval Conference  
A distributed information retrieval system with resource-selection and result-set merging capability was used to search subsets of the GOV2 document corpus for the 2008 TREC Million Query Track.  ...  The sensitivity of Multisearch retrieval performance to variations in the resource selection algorithm is discussed.  ...  Although the ranked TREC 17 Million Query Track NOTEBOOK: Multisearch and resource selection 4 / 10 resource lists were pre-computed for all queries before using the Multisearch system to generate TREC  ... 
dblp:conf/trec/FallenNM08 fatcat:ms5dnyrtxbdjhduzmnr76zx3f4

Collection Selection Based on Historical Performance for Efficient Processing

Christopher T. Fallen, Gregory B. Newby
2007 Text Retrieval Conference  
A Grid Information Retrieval (GIR) simulation was used to process the TREC Million Query Track queries.  ...  TREC Million Query participant median scores.  ...  INTRODUCTION One goal of the ARSC multisearch experiment for the 2007 TREC Million Query Track is to estimate a practical upper bound of the number of document collections that can be independently searched  ... 
dblp:conf/trec/FallenN07 fatcat:gpox7fcznvd5rlc27zp4qqj4bu

Partitioning the Gov2 Corpus by Internet Domain Name: A Result-set Merging Experiment

Christopher T. Fallen, Gregory B. Newby
2006 Text Retrieval Conference  
To study the MultiSearch problem and complete the Ad Hoc Task of the 2006 TREC Terabyte Track, the Gov2 collection was divided according to web domain and for each topic, the results from each domain were  ...  The mean average precision scores of the results from two different merge algorithms applied to the domain-divided Gov2 collection and a randomized domain-divided collection are compared with a 2-way analysis  ...  GIR spans several major themes: distributed indexing, transport methods for queries and result sets, human interface, and methods for query persistence.  ... 
dblp:conf/trec/FallenN06 fatcat:3tprywqq2fgjfmq75yvs22ys6u

Lucene for n-grams using the CLUEWeb Collection

Gregory B. Newby, Christopher T. Fallen, Kylie McCormick
2009 Text Retrieval Conference  
Indexing the Category "B" subset of the ClueWeb collection was accomplished by a divide and conquer method, working across the separate ClueWeb subsets for 1, 2 and 3-grams.  ...  The ARSC team made modifications to the Apache Lucene engine to accommodate "go words," taken from the Google Gigaword vocabulary of n-grams.  ...  Acknowledgements This work was supported in part by a grant of HPC resources from the Arctic Region Supercomputing Center and the DoD High Performance Computing Modernization Program.  ... 
dblp:conf/trec/NewbyFM09 fatcat:ncth6q4xu5fcrp5wm7ykc7ojly