58,815 Hits in 7.3 sec

Top subset retrieval on large collections using sorted indices

Paul Ferguson, Alan F. Smeaton, Cathal Gurrin, Peter Wilkins
2005 Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '05  
We give results based upon the TREC Terabyte dataset showing improvements that these indices give in terms of effectiveness and efficiency.  ...  We present performance details using different forms of sorted indices and show how utilising only the top subset of each of these indices can effect system performance as measured in terms of MAP and  ...  If the documents are sorted naively we cannot attempt to take only a top subset of documents for each term and hope to find a large portion of relevant documents.  ... 
doi:10.1145/1076034.1076147 dblp:conf/sigir/FergusonSGW05 fatcat:6kad7bhdp5b2hejwq7uitk2y2y

Dublin City University at the TREC 2005 Terabyte Track

Paul Ferguson, Cathal Gurrin, Alan F. Smeaton, Peter Wilkins
2005 Text Retrieval Conference  
Our runs for TREC in all tasks were primarily focussed on the application of "Top Subset Retrieval" to the Terabyte Track.  ...  This retrieval utilises different types of sorted inverted indices so that less documents are processed in order to reduce query times, and is done so in a way that minimises loss of effectiveness in terms  ...  The explanation for this would be in large part be due to the use of a rather small top subset of 100,000 documents for each term.  ... 
dblp:conf/trec/FergusonGSW05 fatcat:n7mixhxo3jcnrjzgga6fbn44rm

Relevance feedback revisited

Donna Harman
1992 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '92  
In order to run a user experiment on a large document collectio~experiments were performed at NIST to complete some of the missing links found in using the probabilistic retrieval model.  ...  Researchers have found relevance feedback to be effective in interactive information retriev~although few formal user experiments have been made.  ...  The 1988 paper produced a large set of terms for query expansion by merging all non-common terms from the relevant documents retrieved in the top ten retrieved documents, and then sorted these terms, using  ... 
doi:10.1145/133160.133167 dblp:conf/sigir/Harman92 fatcat:hgp5fy6rkrg7pjvyi4weyy7qee

An Improved Retrievability-Based Cluster-Resampling Approach for Pseudo Relevance Feedback

Shariq Bashir
2016 Computers  
In our experiments we observed when a collection contains retrieval bias, then high retrievable documents of clusters are frequently retrieved at top positions for most of the queries, and these drift  ...  The standard approach works quite well if the retrieval bias of the retrieval model does not create any effect on the retrievability of documents.  ...  We examine this by the retrieval probability of subsets within the top 500 documents (i.e., how many documents of a subset are retrieved within the top 500 documents). indicates results are better than  ... 
doi:10.3390/computers5040029 fatcat:khcbofc2sbastbwguszancrh4m

Partitioning the Gov2 Corpus by Internet Domain Name: A Result-set Merging Experiment

Christopher T. Fallen, Gregory B. Newby
2006 Text Retrieval Conference  
The mean average precision scores of the results from two different merge algorithms applied to the domain-divided Gov2 collection and a randomized domain-divided collection are compared with a 2-way analysis  ...  To study the MultiSearch problem and complete the Ad Hoc Task of the 2006 TREC Terabyte Track, the Gov2 collection was divided according to web domain and for each topic, the results from each domain were  ...  as" d j -by comparing results retrieved from collection indexes defined by G d and indexes defined by a partition G r with the same subset size distribution as G d but with pages assigned to each subset  ... 
dblp:conf/trec/FallenN06 fatcat:3tprywqq2fgjfmq75yvs22ys6u

Improving Retrievability and Recall by Automatic Corpus Partitioning [chapter]

Shariq Bashir, Andreas Rauber
2010 Lecture Notes in Computer Science  
Experiments are validated on 1.2 million patents of the TREC Chemical Retrieval Track.  ...  We furthermore show, that this classification can be used to improve overall retrievability of documents by treating these classes as separate document corpora, combining individual retrieval results.  ...  On top of this, research indicates (again on datasets of limited size) that it may be possible to identify for a given retrieval system, which documents are likely to show high or low retrievability based  ... 
doi:10.1007/978-3-642-16175-9_5 fatcat:bcwiiy77wjhjbmkxtrq6xugzqy

On the relationship between query characteristics and IR functions retrieval bias

Shariq Bashir, Andreas Rauber
2011 Journal of the American Society for Information Science and Technology  
Commonly, random queries are used for approximating documents retrievability due to the prohibitively large query space and time involved in processing all queries.  ...  Additionally, a cumulative retrievability score of documents over all queries is used for analyzing retrieval functions (retrieval) bias.  ...  Correlation is low on lower and higher subsets of length and vocabulary, indicating large difference between two functions on these two extremes.  ... 
doi:10.1002/asi.21549 fatcat:7p2rlobz7vd67nntwsdwy45leq

Active Bucket Categorization for High Recall Video Retrieval

O. de Rooij, M. Worring
2013 IEEE transactions on multimedia  
MediaTable provides options for sorting on any type of metadata in the collection, selecting results using the sort result using various visualizations, and adding these to buckets.  ...  Basic user interactions are defined by sort and select operations. First, users sort the table, and thereby the collection, on any of its columns.  ... 
doi:10.1109/tmm.2013.2237894 fatcat:hdvncthlbrg4tjbm4elr2gp7i4

Quantifying retrieval bias in Web archive search

Thaer Samar, Myriam C. Traub, Jacco van Ossenbruggen, Lynda Hardman, Arjen P. de Vries
2017 International Journal on Digital Libraries  
However, previous studies have shown that these systems can induce a bias, known as the retrievability bias, on the accessibility of documents in community-collected collections (such as TREC collections  ...  Assuming queries are not inherently temporal in nature, the analysis is based on the timestamps of documents in the search results returned using the retrieval model for all queries.  ...  Part of the analysis work was carried out on the Dutch national e-infrastructure with the support of the SURF Foundation.  ... 
doi:10.1007/s00799-017-0215-9 fatcat:em6giqh775cwtooo7vmripllju

Evaluation of Term Ranking Algorithms for Pseudo-Relevance Feedback in MEDLINE Retrieval

Sooyoung Yoo, Jinwook Choi
2011 Healthcare Informatics Research  
Evaluation of Term Ranking Algorithms the 100 top-ranked retrieved documents.  ...  The OHSUMED test collection, which is a subset of the MEDLINE database, was used as a test corpus. Various ranking algorithms were tested in combination with different term re-weighting algorithms.  ...  This indicates that RSV and LCA performed better than the other algorithms at ranking the most useful terms near the top of the list using the default parameter settings.  ... 
doi:10.4258/hir.2011.17.2.120 pmid:21886873 pmcid:PMC3155169 fatcat:o4xqvp7udvgkza3m2lrzgldzii

A three-phase mapreduce-based algorithm for searching biomedical document databases

Milana Grbić
The algorithm performances are tested on different subsets of the large and well-known PubMed biomedical document database.  ...  Retrieving information from large document databases is in the focus of scientific research in recent years.  ...  Information retrieval is finding information (e.g. texts or documents) from a large collection of data, that satisfies specific information queries [1] The information retrieval has a lot of applications  ... 
doi:10.7251/ijeec1901001g fatcat:jaoih327ebgzfl2v7uqpdzphpa

Leveraging visual concepts and query performance prediction for semantic-theme-based video retrieval

Stevan Rudinac, Martha Larson, Alan Hanjalic
2012 International Journal of Multimedia Information Retrieval  
Retrieval is performed using a coherence-based query performance prediction framework.  ...  The proposed retrieval approach is data driven, requires no prior training and relies exclusively on the analyses of the video collection and different results lists returned for the given query text.  ...  Coherence indicator The coherence indicator [8] is used to select the results list with the highest coherence among the top-N retrieved results.  ... 
doi:10.1007/s13735-012-0018-0 fatcat:d4c2pofvangchacpgv7tcvcjwi

Experiments on Genomics Ad Hoc Retrieval

Miguel E. Ruiz
2005 Text Retrieval Conference  
For this purpose we used a distributed retrieval approach and divided the large collection into 5 non overlapping sub collections.  ...  We participated in the Genomics ad hoc retrieval task. Our approach used the SMART system for indexing the large collection of MEDLINE documents.  ...  We also want to thank Susan Humphrey from NLM for providing us with the gene expansion used in this work and to Alan Aronson from NLM for providing MetaMap.  ... 
dblp:conf/trec/Ruiz05 fatcat:oeth6wagafc7jjhfzy3ioaylse

Assessing Efficiency-Effectiveness Tradeoffs in Multi-Stage Retrieval Systems Without Using Relevance Judgments [article]

Charles L. A. Clarke, J. Shane Culpepper, Alistair Moffat
2015 arXiv   pre-print
Large-scale retrieval systems are often implemented as a cascading sequence of phases -- a first filtering step, in which a large set of candidate documents are extracted using a simple technique such  ...  as Boolean matching and/or static document scores; and then one or more ranking steps, in which the pool of documents retrieved by the filter is scored more precisely using dozens or perhaps hundreds of  ...  Introduction The purpose of an information retrieval system is well-defined: given a query q, and a large collection D of documents, identify and present a small subset of the collection by identifying  ... 
arXiv:1506.00717v1 fatcat:v5k7hleeivhonl6lrmtscizqym

Document Retrieval on Repetitive Collections [article]

Gonzalo Navarro, Simon J. Puglisi, Jouni Sirén
2014 arXiv   pre-print
Document retrieval aims at finding the most important documents where a pattern appears in a collection of strings.  ...  We also design new methods that offer superior time/space trade-offs, particularly on repetitive collections.  ...  on large collections.  ... 
arXiv:1404.4909v2 fatcat:cn2rng2jrzd4tawh5xv2jfxsb4
« Previous Showing results 1 — 15 out of 58,815 results