Filters








918 Hits in 2.5 sec

Static score bucketing in inverted indexes

Chavdar Botev, Nadav Eiron, Marcus Fontoura, Ning Li, Eugene Shekita
2005 Proceedings of the 14th ACM international conference on Information and knowledge management - CIKM '05  
This heuristic, however, increases the cost of index generation and requires complex index build algorithms. In this paper, we study a new index organization based on static score bucketing.  ...  Maintaining strict static score order of inverted lists is a heuristic used by search engines to improve the quality of query results when the entire inverted lists cannot be processed.  ...  INDEXING WITH STATIC SCORE BUCKETING In previous work, Long and Suel [6] have proposed an inverted lists organization that is based on a static rank order of the postings.  ... 
doi:10.1145/1099554.1099642 dblp:conf/cikm/BotevEFLS05 fatcat:wclbl5afs5dgdotmlo2tra7d3q

Embellishing text search queries to protect user privacy

HweeHwa Pang, Xuhua Ding, Xiaokui Xiao
2010 Proceedings of the VLDB Endowment  
In this paper, we identify the privacy risks arising from semantically related search terms within a query, and from recurring highspecificity query terms in a search session.  ...  We also provide an accompanying retrieval scheme that enables the search engine to compute the encrypted document relevance scores from only the genuine search terms, yet remain oblivious to their distinction  ...  The similarity scoring model with the inverted index implementation are used extensively in modern document retrieval systems. They also form the foundation of Web search engines.  ... 
doi:10.14778/1920841.1920918 fatcat:rcr6id53tva33cyirgugr45574

Evaluation of a bedside test of utricular function – the bucket test – in older individuals

Daniel Q. Sun, M. Geraldine Zuniga, Marcela Davalos-Bichara, John P. Carey, Yuri Agrawal
2014 Acta Oto-Laryngologica  
Dizziness Handicap Index (DHI), in 51 older individuals aged 70-95 years.  ...  Results-Bucket test scores are correlated in both magnitude and direction with utricle-selective tap-evoked oVEMP asymmetry ratios, but not with sound-evoked cVEMP asymmetry ratios, which are saccule-selective  ...  Table II Association between bucket test outcome and clinical variables. The values given in bold denote statistical significance. AR, asymmetry ratio; DHI, Dizziness Handicap Index.  ... 
doi:10.3109/00016489.2013.867456 pmid:24460151 pmcid:PMC4285154 fatcat:jviww2sgkfaqhmzwywwhrr7sqa

Cache-conscious performance optimization for similarity search

Maha Alabduljalil, Xun Tang, Tao Yang
2013 Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13  
Because of data sparsity, accessing feature vectors in memory for runtime comparison in the second stage, incurs significant overhead due to the presence of memory hierarchy.  ...  All-pairs similarity search can be implemented in two stages. The first stage is to partition the data and group potentially similar vectors.  ...  This work is supported in part by NSF IIS-1118106/0905084 and Kuwait University Scholarship.  ... 
doi:10.1145/2484028.2484077 dblp:conf/sigir/AlabduljalilTY13 fatcat:ptk7klfakbhbnpqwzdmdcs2wxi

Streaming similarity search over one billion tweets using parallel locality-sensitive hashing

Narayanan Sundaram, Aizana Turmukhametova, Nadathur Satish, Todd Mostak, Piotr Indyk, Samuel Madden, Pradeep Dubey
2013 Proceedings of the VLDB Endowment  
We show that this is an order of magnitude faster than existing indexing schemes, such as inverted indexes.  ...  In this paper, we describe a new variant of LSH, called Parallel LSH (PLSH) designed to be extremely efficient, capable of scaling out on multiple nodes and multiple cores, and which supports highthroughput  ...  ACKNOWLEDGEMENTS This work was supported by a grant from Intel, as a part of the Intel Science and Technology Center in Big Data (ISTC-BD).  ... 
doi:10.14778/2556549.2556574 fatcat:z7c2qdi2lvewlkamfphubde7ky

Improved techniques for result caching in web search engines

Qingqing Gan, Torsten Suel
2009 Proceedings of the 18th international conference on World wide web - WWW '09  
Finally, using the same approach, we also obtain performance gains for the related problem of inverted list caching.  ...  Query processing is a major cost factor in operating large web search engines. In this paper, we study query result caching, one of the main techniques used to optimize query processing performance.  ...  Acknowledgements: We thanks Xiaojun Hei for collaboration in the early stages of this work, and Keith Ross and Dan Rubenstein for valuable discussions of caching under Zipfian distributions.  ... 
doi:10.1145/1526709.1526768 dblp:conf/www/GanS09 fatcat:twihgvjmgretthjptzbqxyw3x4

Accelerating instant question search with database techniques

Takeharu Eda, Toshio Uchiyama, Katsuji Bessho, Norifumi Katafuchi, Alice Chen, Ryoji Kataoka
2011 Proceedings of the 20th international conference companion on World wide web - WWW '11  
In this paper, we propose a user-support tool for composing questions in such services.  ...  Distributed question answering services, like Yahoo Answer 1 and Aardvark 2 , are known to be useful for end users and have also opened up numerous topics ranging in many research fields.  ...  When the number of buckets becomes too large, the buckets are projected into another set of buckets.  ... 
doi:10.1145/1963192.1963290 dblp:conf/www/EdaUBKCK11 fatcat:k43wwxmbpfe5tm4iax5bmqcoie

Compressing term positions in web indexes

Hao Yan, Shuai Ding, Torsten Suel
2009 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09  
We focus on techniques for compressing term positions in web search engine indexes.  ...  This has led to a lot of research on how to improve query throughput, using techniques such as massive parallelism, caching, early termination, and inverted index compression.  ...  Inverted Index Compression Many different inverted index compression techniques have been proposed in the literature [28] .  ... 
doi:10.1145/1571941.1571969 dblp:conf/sigir/YanDS09 fatcat:2focpoagrjf3hm4qpnx5nvqkma

Performance of query processing implementations in ranking-based text retrieval systems using inverted indices

B. Barla Cambazoglu, Cevdet Aykanat
2006 Information Processing & Management  
To our knowledge, six of these techniques are not discussed in any other publication before.  ...  Similarity calculations and document ranking form the computationally expensive parts of query processing in ranking-based text retrieval.  ...  An inverted index is composed of two parts: a set of inverted lists and an index into these lists.  ... 
doi:10.1016/j.ipm.2005.06.004 fatcat:v5xx6y2255arleq2u2vzm57hyu

Temporal Spatial-Keyword Top-k publish/subscribe

Lisi Chen, Gao Cong, Xin Cao, Kian-Lee Tan
2015 2015 IEEE 31st International Conference on Data Engineering  
Users are interested in receiving up-to-date tweets such that their locations are close to a user specified location and their texts are interesting to users.  ...  The TaSK query takes into account text relevance, spatial proximity, and recency of geo-textual objects in evaluating its relevance with a geo-textual object.  ...  ACKNOWLEDGMENT This work is supported in part by a grant awarded by a Singapore MOE AcRF Tier 2 Grant (ARC30/ 12).  ... 
doi:10.1109/icde.2015.7113289 dblp:conf/icde/ChenCCT15 fatcat:x7m4j4tkujgizgbi27nswq4jja

External-Memory Multimaps [chapter]

Elaine Angelino, Michael T. Goodrich, Michael Mitzenmacher, Justin Thaler
2011 Lecture Notes in Computer Science  
For example, the inverted file data structure that is used prevalently in the infrastructure supporting search engines is a type of multimap, where words are used as keys and document pointers are used  ...  The key technique used to achieve our results is a combination of cuckoo hashing using buckets that hold multiple items with a multiqueue implementation to cope with varying numbers of values per key.  ...  In this case, the multimap could be viewed as providing a dynamic functionality for a classic static data structure, known as an inverted file or inverted index (e.g., see Knuth [11] ).  ... 
doi:10.1007/978-3-642-25591-5_40 fatcat:cmzvr5xyhjad3icmh4a4fnm5my

External-Memory Multimaps [article]

Elaine Angelino, Michael T. Goodrich, Michael Mitzenmacher, Justin Thaler
2011 arXiv   pre-print
For example, the inverted file data structure that is used prevalently in the infrastructure supporting search engines is a type of multimap, where words are used as keys and document pointers are used  ...  The key technique used to achieve our results is a combination of cuckoo hashing using buckets that hold multiple items with a multiqueue implementation to cope with varying numbers of values per key.  ...  In this case, the multimap could be viewed as providing a dynamic functionality for a classic static data structure, known as an inverted file or inverted index (e.g., see Knuth [11] ).  ... 
arXiv:1104.5533v2 fatcat:ghuoeyxt2jcbzmpkzndu6zqz7i

External-Memory Multimaps

Elaine Angelino, Michael T. Goodrich, Michael Mitzenmacher, Justin Thaler
2013 Algorithmica  
For example, the inverted file data structure that is used prevalently in the infrastructure supporting search engines is a type of multimap, where words are used as keys and document pointers are used  ...  The key technique used to achieve our results is a combination of cuckoo hashing using buckets that hold multiple items with a multiqueue implementation to cope with varying numbers of values per key.  ...  In this case, the multimap could be viewed as providing a dynamic functionality for a classic static data structure, known as an inverted file or inverted index (e.g., see Knuth [11] ).  ... 
doi:10.1007/s00453-013-9770-7 fatcat:3cqa6u3njng6leyrn3gsud74je

Maguro, a system for indexing and searching over very large text collections

Knut Magne Risvik, Trishul Chilimbi, Henry Tan, Karthik Kalyanaraman, Chris Anderson
2013 Proceedings of the sixth ACM international conference on Web search and data mining - WSDM '13  
Maguro is part of the serving stack in Bing and allows us to scale the index significantly better.  ...  A long tail distribution of content calls for different trade-offs in the design space for good efficiency across the entire index range.  ...  In addition, we would like to thank Qi Lu, Harry Shum, and Chad Walters for their support throughout the project.  ... 
doi:10.1145/2433396.2433486 dblp:conf/wsdm/RisvikCTKA13 fatcat:d2uz2xu7hvetlo4mjwdcsq63i4

Relevance Matters: Capitalizing on Less (Top-k Matching in Publish/Subscribe)

Mohammad Sadoghi, Hans-Arno Jacobsen
2012 2012 IEEE 28th International Conference on Data Engineering  
The efficient processing of large collections of Boolean expressions plays a central role in major data intensive applications ranging from user-centric processing and personalization to real-time data  ...  Finally, the performance of BE*-Tree is proven through a comprehensive experimental comparison against state-of-the-art index structures for matching Boolean expressions.  ...  In contrast, a scalable top-k model, but based on a static and flat structure, with a generic scoring function, which also takes the event into consideration, is introduced in k-index [2] .  ... 
doi:10.1109/icde.2012.38 dblp:conf/icde/SadoghiJ12 fatcat:nzt236ufv5drfpeubtklphvy7m
« Previous Showing results 1 — 15 out of 918 results