53,244 Hits in 7.6 sec

Document allocation policies for selective searching of distributed indexes

Anagha Kulkarni, Jamie Callan
2010 Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10  
A thorough study of the tradeoff between search cost and search accuracy in a sharded index environment is performed using three large TREC collections.  ...  Random, source-based and topic-based document-to-shard allocation policies are studied in the context of selective search.  ...  These non-uniform distribution are an artifact of the topical distribution of the documents in the collections and thus unlikely to be uniform for most collections.  ... 
doi:10.1145/1871437.1871497 dblp:conf/cikm/KulkarniC10 fatcat:mvf4qve6frbjnd2yzmuvw3okvq

Balancing Precision and Recall with Selective Search

Mon-Shih Chuang, Anagha Kulkarni
2017 Symposium on Information Management and Big Data  
Toward this goal we investigate two new shard selection approaches, and conduct a series of experiments that lead to three new findings:-1.  ...  If the relevant documents for a query are spread across less than 10% of the shards then Selective Search can successfully balance precision and recall.  ...  This is an order of magnitude smaller than the search space of Exhaustive Search (50+ million documents).  ... 
dblp:conf/simbig/ChuangK17 fatcat:7gckweeja5e6ta5tyim5paouuq

Scalable Term Selection for Text Categorization

Jingyang Li, Maosong Sun
2007 Conference on Empirical Methods in Natural Language Processing  
between the specificity and the exhaustivity of the term subset.  ...  In text categorization, term selection is an important step for the sake of both categorization accuracy and computational efficiency.  ...  Acknowledgement The research is supported by the National Natural Science Foundation of China under grant number 60573187, 60621062 and 60520130299.  ... 
dblp:conf/emnlp/LiS07 fatcat:765n32e32jc6pfmxixlzhksbou

Selective Search

Anagha Kulkarni, Jamie Callan
2015 ACM Transactions on Information Systems  
This article investigates and extends an alternative: selective search, an approach that partitions the dataset based on document similarity to obtain topic-based shards, and searches only a few shards  ...  The experimental results demonstrate that selective search's effectiveness is on par with that of exhaustive search, and the corresponding search costs are substantially lower with the former.  ...  On an average, the density of relevant documents in the search space of exhaustive search is 7·10 −6 , 1·10 −6 , and 2·10 −7 for Gov2, ClueWeb09-B, ClueWeb09-English, respectively.  ... 
doi:10.1145/2738035 fatcat:fistpgm5abemdeecnpiqmt4szi


Robin Aly, Djoerd Hiemstra, Thomas Demeester
2013 Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13  
State-of-the-art shard selection algorithms first query a central index of sampled documents, and their effectiveness is similar to searching all shards.  ...  This paper proposes Taily, a novel shard selection algorithm that models a query's score distribution in each shard as a Gamma distribution and selects shards with highly scored documents in the tail of  ...  in The Netherlands, Ghent University in Belgium, and iMinds (Interdisciplinary institute for Technology), a research institute founded by the Flemish Government.  ... 
doi:10.1145/2484028.2484033 dblp:conf/sigir/AlyHD13 fatcat:xvijdu37mnaz7oy4mjetlhhtbm

Efficient distributed selective search

Yubin Kim, Jamie Callan, J. Shane Culpepper, Alistair Moffat
2016 Information retrieval (Boston)  
are used; measuring the effect of two policies for assigning index shards to machines; and exploring the benefits of index-spreading and mirroring as the number of deployed machines is varied.  ...  In this paper we extend the study of selective search into new areas using a fine-grained simulation, examining the difference in efficiency when term-based and sample-based resource selection algorithms  ...  Shane Culpepper is the recipient of an Australian Research Council DECRA Research Fellowship (DE140100275).  ... 
doi:10.1007/s10791-016-9290-6 fatcat:3rl5hm5rivbcbjgjznzkzcff7m

Page 38 of Library & Information Science Abstracts Vol. , Issue 5 [page]

1992 Library & Information Science Abstracts  
The effect of variations in indexing exhaustivity on retrieval performance in a vector space retrieval system was investigated by using a term weight threshold to construct different document representations  ...  Influence of Exhaustivity 92/2888 The effect of indexing exhaustivity on retrieval performance. Robert Burgin. Information Processing & Management, 27 (6) 1991, 623-628. tables. refs.  ... 

Applications Of Informetrics To Information Retrieval Research

Dietmar Wolfram
2000 Informing Science  
A non-technical overview of two primary areas of study within the discipline of information science, information retrieval (IR) and informetrics, is presented.  ...  Informetric properties of IR systems as the basis for understanding IR system structure and generalizing human information seeking in electronic environments are discussed.  ...  With knowledge of the index term distribution of a database, one can predict the minimum index lookup time for entries.  ... 
doi:10.28945/581 fatcat:mzfnkhluxbhvnikeozz74zl6ym

Comparing Distributed Indexing: To MapReduce or Not?

Richard McCreadie, Craig Macdonald, Iadh Ounis
2009 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval  
In this work, we investigate distributed indexing paradigms, in particular within the auspices of the MapReduce programming framework.  ...  In particular, we describe two indexing approaches based on the original MapReduce paper, and compare these with a standard distributed IR system, the MapReduce indexing strategy used by the Nutch IR platform  ...  Figure 3 presents an example for a distributed setting MapReduce indexing paradigm of 200 documents.  ... 
dblp:conf/sigir/McCreadieMO09a fatcat:zyiaux74cff73lvk5v7s7j3cze

Ontology-Based Specific and Exhaustive User Profiles for Constraint Information Fusion for Multi-agents

Xiaohui Tao, Yuefeng Li, Raymond Y.K. Lau, Shlomo Geva
2010 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology  
When searching information from a distributed Web environment, information is retrieved by multi-agents on the client site and fused on the broker site.  ...  By search using specific and exhaustive user profiles, information fusion techniques no longer rely on the statistics provided by agents.  ...  Acknowledgements The work presented in this paper was partially supported by Grant DP0988007 from the Australian Research Council. References  ... 
doi:10.1109/wi-iat.2010.76 dblp:conf/webi/TaoLLG10 fatcat:6jwyewpzufaixbjpmpteviijmu

Theoretical evaluation of XML retrieval

Tobias Blanke
2012 SIGIR Forum  
Thorough with the ep − gr metric and generalised quantisation 8.2 INEX 2005 Focussed with the nxCG metric, strict quantisation and rank 10 8.3 INEX 2005 Focussed with the nxCG metric, strict quantisation  ...  and rank 50 8.4 INEX 2005 Focussed with the nxCG metric, generalised quantisation and rank 50 .  ...  We have to show that assuming map(A) ≡ {φ} and map(B) ≡ {φ}, also rsv(A, B) > 0, where A and B are sets of index terms. The latter is the case if there is an index term both part of A and B.  ... 
doi:10.1145/2215676.2215689 fatcat:grnfihrombgttb6p6wy4k36vum

Bibliometrics, Information Retrieval and Natural Language Processing: Natural Synergies to Support Digital Library Research

Dietmar Wolfram
2016 ACM/IEEE Joint Conference on Digital Libraries  
Similarly, techniques developed for IR and database technology have made the investigation of large-scale bibliometric phenomena feasible.  ...  Historically, researchers have not fully capitalized on the potential synergies that exist between bibliometrics and information retrieval (IR).  ...  Aspects of IR system content that lend themselves to bibliometric modeling include index term frequency distributions, indexing exhaustivity or term assignment, term co-occurrence frequency distributions  ... 
dblp:conf/jcdl/Wolfram16 fatcat:6wsjhieehvhlbdyhjhqshhpokq

From keywords to keyqueries

Tim Gollub, Matthias Hagen, Maximilian Michel, Benno Stein
2013 Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13  
Keyqueries are defined implicitly by the index and the retrieval model of a reference search engine: keyqueries for a document are the minimal queries that return the document in the top result ranks.  ...  To determine the keyqueries for a document, we present an exhaustive search algorithm along with effective pruning strategies.  ...  These documents appear in the top results for significantly more of the basic index terms than other documents.  ... 
doi:10.1145/2484028.2484181 dblp:conf/sigir/GollubHMS13 fatcat:4kefpe37brcwdnd7trktwjdg54

Lightweight Random Indexing for Polylingual Text Classification

Alejandro Moreo Fernández, Andrea Esuli, Fabrizio Sebastiani
2016 The Journal of Artificial Intelligence Research  
We analyse RI in terms of space and time efficiency, and propose a particular configuration of it (that we dub Lightweight Random Indexing – LRI).  ...  By running experiments on two well known public benchmarks, Reuters RCV1/RCV2 (a comparable corpus) and JRC-Acquis (a parallel one), we show LRI to outperform (both in terms of effectiveness and efficiency  ...  Acknowledgements Fabrizio Sebastiani is on leave from Consiglio Nazionale delle Ricerche, Italy.  ... 
doi:10.1613/jair.5194 fatcat:ck77um62dfaslexp3jvapo7taa

Specificity Aboutness in XML Retrieval [chapter]

Tobias Blanke, Mounia Lalmas
2009 Lecture Notes in Computer Science  
XML retrieval deals with retrieving those document components that specifically answer a query, and filters are a method of delivering the most focused answers.  ...  This paper presents a theoretical methodology to evaluate filters in XML retrieval. Theoretical evaluation is concerned with the formal investigation of qualitative properties of retrieval models.  ...  Acknowledgements Mounia Lalmas is currently funded by Microsoft Research/Royal Academy of Engineering.  ... 
doi:10.1007/978-3-642-04417-5_16 fatcat:xlkgru237fcedkutoapwmb3z7q
« Previous Showing results 1 — 15 out of 53,244 results