Filters








2,583 Hits in 5.2 sec

Performance of compressed inverted list caching in search engines

Jiangong Zhang, Xiaohui Long, Torsten Suel
2008 Proceeding of the 17th international conference on World Wide Web - WWW '08  
We focus on two techniques, inverted index compression and index caching, which play a crucial rule in web search engines as well as other high-performance information retrieval systems.  ...  Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges.  ...  inverted list caching on real search engine query traces (AOL and Excite).  ... 
doi:10.1145/1367497.1367550 dblp:conf/www/ZhangLS08 fatcat:x6yixi5obbe47mh76luelarrqu

Modeling Static Caching in Web Search Engines [chapter]

Ricardo Baeza-Yates, Simon Jonassen
2012 Lecture Notes in Computer Science  
In this paper we model a two-level cache of a Web search engine, such that given memory resources, we find the optimal split fraction to allocate for each cache, results and index.  ...  The final result is very simple and implies to compute just five parameters that depend on the input data and the performance of the search engine.  ...  Caching index term lists: As the search engine evaluates a particular query, it may decide to store in memory the inverted lists of the involved query terms.  ... 
doi:10.1007/978-3-642-28997-2_37 fatcat:7oen45tyhjh35o2g2fg4t46pfq

Three-Level Caching for Efficient Query Processing in Large Web Search Engines

Xiaohui Long, Torsten Suel
2006 World wide web (Bussum)  
To keep up with this immense workload, large search engines employ clusters of hundreds or thousands of machines, and a number of techniques such as caching, index compression, and index and query pruning  ...  This intermediate level attempts to exploit frequently occurring pairs of terms by caching intersections or projections of the corresponding inverted lists.  ...  Thus, search engines need to score only those documents that occur in the intersection of the inverted lists.  ... 
doi:10.1007/s11280-006-0221-0 fatcat:tmer4oaw7zfwlcvqojhtfk5zjm

Three-level caching for efficient query processing in large Web search engines

Xiaohui Long, Torsten Suel
2005 Proceedings of the 14th international conference on World Wide Web - WWW '05  
To keep up with this immense workload, large search engines employ clusters of hundreds or thousands of machines, and a number of techniques such as caching, index compression, and index and query pruning  ...  This intermediate level attempts to exploit frequently occurring pairs of terms by caching intersections or projections of the corresponding inverted lists.  ...  Thus, search engines need to score only those documents that occur in the intersection of the inverted lists.  ... 
doi:10.1145/1060745.1060785 dblp:conf/www/LongS05 fatcat:xbodm26ml5fk5lsrqzhxe7wmti

Performance improvements for search systems using an integrated cache of lists + intersections

Gabriel Tolosa, Esteban Feuerstein, Luca Becchetti, Alberto Marchetti-Spaccamela
2017 Information retrieval (Boston)  
Previous studies show that the use of caching techniques is crucial in search engines, as it helps reducing query response times and processing workloads on search servers.  ...  We also represent the data in cache in both raw and compressed forms and evaluate the differences between them using different configurations of cache sizes.  ...  In the case of industry-scale web search engines, the entire inverted index is usually stored in main memory [Dean (2009)] .  ... 
doi:10.1007/s10791-017-9299-5 fatcat:nerda4pzs5fovjlmajwfia7wzq

Batch query processing for web search engines

Shuai Ding, Josh Attenberg, Ricardo Baeza-Yates, Torsten Suel
2011 Proceedings of the fourth ACM international conference on Web search and data mining - WSDM '11  
In this paper, we motivate and discuss the problem of batch query processing in search engines, identify basic mechanisms for improving the performance of such queries, and provide a preliminary experimental  ...  Large web search engines are now processing billions of queries per day. Most of these queries are interactive in nature, requiring a response in fractions of a second.  ...  One important technique for optimizing performance in search engines is caching.  ... 
doi:10.1145/1935826.1935858 dblp:conf/wsdm/DingABS11 fatcat:3sjgnc5bwvdsjdelcqycp46k7u

Index compression is good, especially for random access

Stefan Büttcher, Charles L. A. Clarke
2007 Proceedings of the sixteenth ACM conference on Conference on information and knowledge management - CIKM '07  
In fact, we demonstrate that, in some cases, random access into a term's postings list may be realized more efficiently if the list is stored in compressed form instead of uncompressed.  ...  In this paper, we show that index compression does not harm random access performance.  ...  engine can carry out random access operations into the inverted lists stored in the index.  ... 
doi:10.1145/1321440.1321546 dblp:conf/cikm/ButtcherC07 fatcat:ojufxbxyhneyjn23mbcrmnp4se

Text vs. space

Maria Christoforaki, Jinru He, Constantinos Dimopoulos, Alexander Markowetz, Torsten Suel
2011 Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11  
We feel that previous work has often focused on the spatial aspect at the expense of performance considerations in text processing, such as inverted index access, compression, and caching.  ...  Important examples include local search engines such as Google Local and location-based search services for smart phones.  ...  Acknowledgments This research was supported by NSF Grant IIS-0803605, "Efficient and Effective Search Services over Archival Webs".  ... 
doi:10.1145/2063576.2063641 dblp:conf/cikm/ChristoforakiHDMS11 fatcat:f5kaghcwnzcrlgxkcvjlp344zy

Entry Pairing in Inverted File [chapter]

Hoang Thanh Lam, Raffaele Perego, Nguyen Thoi Minh Quan, Fabrizio Silvestri
2009 Lecture Notes in Computer Science  
We apply our technique: (i) to compact a compressed inverted file built on an actual Web collection of documents, and (ii) to increase capacity of an in-memory posting list.  ...  Experiments showed that in the first case our approach can improve the compression ratio of up to 7.7%, while we measured a saving from 12% up to 18% in the size of the posting cache.  ...  For testing the term pairing technique on cached posting lists we use a query log of 51 millions queries collected in 2003 by the Brazilian search engine TodoBr (the same source of the WBR99 collection  ... 
doi:10.1007/978-3-642-04409-0_50 fatcat:kdx7jiahjvab5eysdcn567hlie

Compression of inverted indexes For fast query evaluation

Falk Scholer, Hugh E. Williams, John Yiannis, Justin Zobel
2002 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02  
In this paper, we revisit the compression of inverted lists of document postings that store the position and frequency of indexed terms, considering two approaches to improving retrieval efficiency: better  ...  We conclude that fast byte-aligned codes should be used to store integers in inverted lists.  ...  Inverted indexes are used to evaluate queries in all practical search engines [14] . Compression of these indexes has three major benefits for performance.  ... 
doi:10.1145/564414.564416 fatcat:izdv7qxxcjgyzk4uvrcwk276dm

Compression of inverted indexes For fast query evaluation

Falk Scholer, Hugh E. Williams, John Yiannis, Justin Zobel
2002 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02  
In this paper, we revisit the compression of inverted lists of document postings that store the position and frequency of indexed terms, considering two approaches to improving retrieval efficiency: better  ...  We conclude that fast byte-aligned codes should be used to store integers in inverted lists.  ...  Inverted indexes are used to evaluate queries in all practical search engines [14] . Compression of these indexes has three major benefits for performance.  ... 
doi:10.1145/564376.564416 dblp:conf/sigir/ScholerWYZ02 fatcat:yleoumffjfbxnfuvrzwmanyjbu

Performance Improvements for Search Systems Using an Integrated Cache of Lists+Intersections [chapter]

Gabriel Tolosa, Luca Becchetti, Esteban Feuerstein, Alberto Marchetti-Spaccamela
2014 Lecture Notes in Computer Science  
In this study we propose and evaluate a static cache that works simultaneously as list and intersection cache, offering a more efficient way of handling cache space.  ...  Simulation using two datasets and a real query log reveal that the proposed approach improves overall performance in terms of total processing time, achieving savings of up to 40% in the best case.  ...  They also present a framework for the analysis of the tradeoff between caching query results and caching posting lists. In [24] inverted index compression and list caching techniques are explored.  ... 
doi:10.1007/978-3-319-11918-2_22 fatcat:tsclp3mbrrf7tmirvy7dakqaem

Efficient query processing in distributed search engines

Simon Jonassen
2012 SIGIR Forum  
Third, we elaborate on caching in Web search engines in two independent contributions. First, we present an analytical model that finds the optimal split in a static memory-based two-level cache.  ...  Web search engines have to deal with a rapidly increasing amount of information, high query loads and tight performance constraints.  ...  Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors, and do not necessarily reflect the views of the funding agencies.  ... 
doi:10.1145/2492189.2492201 fatcat:uwasxhngrfgntemkhawyv3te64

Application-Specific Disk I/O Optimisation for a Search Engine

Xiangfei Jia, Andrew Trotman, Richard O'Keefe, Zhiyi Huang
2008 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies  
In this paper we provide a solution for applicationspecific I/O for optimising a search engine. It shows a 28% improvement when compared to the general-purpose I/O optimisation of Linux.  ...  However, application level I/O optimisation can achieve better performance since an application has a better knowledge of how to optimise disk I/O for the application.  ...  The posting list of a stopword was Performance of the caching, prefetching and scheduling algorithms for the search engine Figure 3 . 3 Overall Comparison In the Linux kernel, prefetching is often  ... 
doi:10.1109/pdcat.2008.61 dblp:conf/pdcat/JiaTOH08 fatcat:hytjid45kfgw3gmvxu7n3mzqlu

Document Reordering is Good, Especially for e-Commerce

Vishnusaran Ramaswamy, Roberto Konow, Andrew Trotman, Jon Degenhardt, Nick Whyte
2017 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval  
Document id reordering is a well-known technique in web search for improving index compressibility and reducing query processing time.  ...  We explore and evaluate the benefits of document id reordering for large-scale e-Commerce search.  ...  Either way, in order to improve efficiency the inverted index is typically loaded into main memory when the search engine starts up, and parts of it are compressed in order to reduce the memory footprint  ... 
dblp:conf/sigir/RamaswamyKTDW17 fatcat:ltzsasocnnewpbmudf2ko5zxuq
« Previous Showing results 1 — 15 out of 2,583 results