Filters








791 Hits in 8.2 sec

Improving the performance of pipelined query processing with skipping—and its comparison to document-wise partitioning

Simon Jonassen, Svein Erik Bratsberg
2013 World wide web (Bussum)  
In this paper, we evaluate the effect of inverted index skipping on the performance of pipelined query processing.  ...  However, the query processing latency and scalability with respect to the collections size are the main challenges associated with this method.  ...  This work was supported by the iAd Centre and funded by the Norwegian University of Science and Technology and the Research Council of Norway.  ... 
doi:10.1007/s11280-013-0260-2 fatcat:czrimud3ifd2tkm75bkt2u3tby

Efficient query processing in distributed search engines

Simon Jonassen
2012 SIGIR Forum  
Subsequently, we present several skipping extensions to pipelined query processing, which as we show can improve the query processing performance and/or the quality of results.  ...  The success of a search engine depends on the speed with which it answers queries (efficiency) and the quality of its answers (effectiveness).  ...  This work was supported by the iAd Centre and funded by the Norwegian University of Science and Technology and the Research Council of Norway. Acknowledgments.  ... 
doi:10.1145/2492189.2492201 fatcat:uwasxhngrfgntemkhawyv3te64

Clustering and load balancing optimization for redundant content removal

Shanzhong Zhu, Alexandra Potapova, Maha Alabduljalil, Xin Liu, Tao Yang
2012 Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion  
This paper discusses our experience in developing a scalable approach with parallel clustering that detects and removes near duplicates incrementally when processing billions of web pages.  ...  The experimental results evaluate the efficiency and accuracy of the incremental clustering, assess the effectiveness of the multidimensional mapping, and demonstrate the impact on online cost reduction  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.  ... 
doi:10.1145/2187980.2187992 dblp:conf/www/ZhuPALY12 fatcat:xmtbky2fx5c2jbifumpw7pvy6m

EgoSet

Xin Rong, Zhe Chen, Qiaozhu Mei, Eytan Adar
2016 Proceedings of the Ninth ACM International Conference on Web Search and Data Mining - WSDM '16  
Empirical evaluation against state-ofthe-art baselines shows that our solution, EgoSet, is able to not only capture multiple facets in the input query, but also generate expansions for each facet with  ...  By blending the two resources we are able to produce sparse word ego-networks that are centered on the seed terms and are able to capture semantic equivalence among words.  ...  Acknowledgments This work is partially supported by the National Science Foundation under grant numbers IIS-1054199 and CCF-1048168. We thank our reviewers for very helpful comments and suggestions.  ... 
doi:10.1145/2835776.2835808 dblp:conf/wsdm/RongCMA16 fatcat:yfjg4asinzel3n7ea3rwgh6hle

ZenLDA: Large-scale topic model training on distributed data-parallel platform

Bo Zhao, Hucheng Zhou, Guoqiang Li, Yihua Huang
2018 Big Data Mining and Analytics  
To push the performance to the limit, we also present two approximations, sparse model initialization and "converged" token exclusion, as well as several system level optimizations.  ...  When compared with state-of-art systems, ZenLDA achieves comparable (even better) performance with similar accuracy.  ...  Besides, to improve the training performance, they required system supports such as mini-batch processing [5, 22, 36] and pipeline processing of data prefetching and sampling process [22, 36] .  ... 
doi:10.26599/bdma.2018.9020006 dblp:journals/bigdatama/ZhaoZLH18 fatcat:luhcc3xbobhvjbpvety7g73a7m

Fast Compilation and Execution of SQL Queries with WebAssembly [article]

Immanuel Haffner, Jens Dittrich
2021 arXiv   pre-print
In this work, we investigate query execution by compilation to WebAssembly. We are able to compile even complex queries in less than a millisecond to machine code with near-optimal performance.  ...  Our approach provides both low latency and high throughput, is adaptive out of the box, and is straight forward to implement.  ...  With selectivities closer to 0% or 100%, the frequency of branch misprediction declines and performance improves.  ... 
arXiv:2104.15098v2 fatcat:yoic2rp7mzai3k6gscpbockyiu

D4.3 – WP4 Scientific Report and Prototype Description – Y3

Yosef Moatti, Paula Ta Shma, Guy Khazma, Javier López Moratalla, Jacob Roldan, Rogelio Rodriguez, Luis Tomás Bolívar, Marta Patiño, Ainhoa Azqueta, George Makridis, Christos Doulkeridis, Maria Kanakari (+4 others)
2021 Zenodo  
An initial demonstration of the capabilities offered by the data services has been performed during the interim review of the project, in which all the components have been integrated and interacted to  ...  The data services of this environment are naturally at the core of BigDataStack and are covered in this deliverable in terms of design specification as well as in terms of integration and experimentation  ...  One of the major improvements was Data Skipping for Queries with JOINs as Apache Spark 3.0 introduced "dynamic partition pruning".  ... 
doi:10.5281/zenodo.4442344 fatcat:dd3d7d3hofcp5gn7lrmxwmrdr4

Hash-Based Structural Join Algorithms [chapter]

Christian Mathis, Theo Härder
2006 Lecture Notes in Computer Science  
Therefore, it is not possible to design the structural join algorithm.  ...  and element distributions; enable pipelining; and (probably) more.  ...  Note, the strategies in [11, 14] elaborate on partition-based processing schemes, i. e., they assume a small amount of main memory and large input sequences, requiring their partition-wise processing  ... 
doi:10.1007/11896548_14 fatcat:f6mqon3zbza2jlamagypeqilba

Fast, Incremental Inverted Indexing in Main Memory for Web-Scale Collections [article]

Nima Asadi, Jimmy Lin
2013 arXiv   pre-print
In other words, it is not necessary to lay out in-memory data structures such that all postings for a term are contiguous; we can achieve ideal performance with a relatively small amount of effort.  ...  Designing efficient in-memory algorithms requires understanding modern processor architectures and memory hierarchies: in this paper, we explore the issue of postings lists contiguity.  ...  [19] and Indri 3 (v5.1). To ensure a fair comparison with the other systems, we disabled their document parsing phase and used the already parsed documents as input.  ... 
arXiv:1305.0699v1 fatcat:oy5krcryyzf5zfvaptkxwgpqai

Instant loading for main memory databases

Tobias Mühlbauer, Wolf Rödiger, Robert Seilbeck, Angelika Reiser, Alfons Kemper, Thomas Neumann
2013 Proceedings of the VLDB Endowment  
Once data is loaded, updates and queries are efficiently processed with the flexibility, security, and high performance of relational main memory databases.  ...  To analyze such data in traditional disk-based database systems, it needs to be bulk loaded, an operation whose performance largely depends on the wire speed of the data source and the speed of the data  ...  To improve query performance, some approaches, e.g., HAIL [7] , propose using binary representations of text files for query processing.  ... 
doi:10.14778/2556549.2556555 fatcat:j2ig6anp2zfo5e5jkla7ucqxvm

Super-Scalar RAM-CPU Cache Compression

M. Zukowski, S. Heman, N. Nes, P. Boncz
2006 22nd International Conference on Data Engineering (ICDE'06)  
We evaluated the performance of PFOR-DELTA with respect to both compression ratio and speed on inverted file data derived from the INEX and TREC document collections, and compared it with the implementation  ...  Here, decompression may be skipped if the query performs the selection directly on the integer code (e.g. on gender=1 instead of gender="FEMALE"), which both needs less I/O and uses a less CPU-intensive  ... 
doi:10.1109/icde.2006.150 dblp:conf/icde/ZukowskiHNB06 fatcat:c3umeb2np5ecfijagmxttucsny

ZenLDA: An Efficient and Scalable Topic Model Training System on Distributed Data-Parallel Platform [article]

Bo Zhao, Hucheng Zhou, Guoqiang Li, Yihua Huang
2015 arXiv   pre-print
and model parallelism are required because of the Big sampling data with up to billions of documents and Big model size with up to trillions of parameters. zenLDA combines both algorithm level improvements  ...  To better fit in distributed data-parallel framework and achieve comparable performance with contemporary systems, we also presented several system level optimizations to push the performance limit. zenLDA  ...  PLDA+ [17] makes use of data placement and pipeline processing to greatly reduce communication time.  ... 
arXiv:1511.00440v1 fatcat:3clon4q5ofatxjrva5unrvajge

Template detection for large scale search engines

Liang Chen, Shaozhi Ye, Xing Li
2006 Proceedings of the 2006 ACM symposium on Applied computing - SAC '06  
In this paper, we propose a novel two-stage template detection method, which combines template detection and removal with the index building process of a search engine.  ...  Second, similar contents sharing the common layout style are detected during the index building process. The blocks with similar layout style and content are identified as templates and deleted.  ...  Since templates occur in many web pages, DF (Document Frequency) of a template word must exceed a certain threshold, thus we can skip the words with low DF.  ... 
doi:10.1145/1141277.1141534 dblp:conf/sac/ChenYL06 fatcat:7msa4x2ebzfvrf5blkkoefkyqe

SQL server column store indexes

Per-Åke Larson, Cipri Clinciu, Eric N. Hanson, Artem Oks, Susan L. Price, Srikumar Rangarajan, Aleksandras Surna, Qingqing Zhou
2011 Proceedings of the 2011 international conference on Management of data - SIGMOD '11  
Together they greatly improve performance of typical data warehouse queries, routinely by 10X and in some cases by a 100X or more.  ...  The repertoire of batch mode operators has been expanded, existing operators have been improved, and query optimization has been enhanced.  ...  Improved Index Build The way a column store index is built has been improved to make the process more dynamic and improve the quality of the index.  ... 
doi:10.1145/1989323.1989448 dblp:conf/sigmod/LarsonCHOPRSZ11 fatcat:ars4o7shurclfhchfg3rbz6iay

F1 query

Bart Samwel, Himani Apte, Felix Weigel, David Wilhite, Jiacheng Yang, Jun Xu, Jiexing Li, Zhan Yuan, Craig Chasseur, Qiang Zeng, Ian Rae, John Cieslewicz (+24 others)
2018 Proceedings of the VLDB Endowment  
resources for performant query processing with high throughput and low latency; (iii) it provides high scalability for large data sizes by increasing computational parallelism; and (iv) it is extensible  ...  F1 Query has also significantly reduced the need for developing hard-coded data processing pipelines by enabling declarative queries integrated with custom business logic.  ...  Finally, thank you to to the F1 SRE team for amazing F1 Query production support and help in scaling the service to 1000s of users.  ... 
doi:10.14778/3229863.3229871 fatcat:ttatl6drrrg4tex2ol6grsaqqa
« Previous Showing results 1 — 15 out of 791 results