Filters








1,559 Hits in 2.8 sec

Compression, SIMD, and Postings Lists

Andrew Trotman
2014 Proceedings of the 2014 Australasian Document Computing Symposium on - ADCS '14  
The three generations of postings list compression strategies (Variable Byte Encoding, Word Aligned Codes, and SIMD Codecs) are examined in order to test whether or not each truly represented a generational  ...  Some weaknesses of the current SIMD-based schemes are identified and a new scheme, QMX, is introduced to address both space and decoding inefficiencies.  ...  INTRODUCTION Modern search engines usually store their postings list in memory and compressed.  ... 
doi:10.1145/2682862.2682870 dblp:conf/adcs/Trotman14 fatcat:ej226lelurhevcv6iyaum4t24m

A General SIMD-Based Approach to Accelerating Compression Algorithms

Wayne Xin Zhao, Xudong Zhang, Daniel Lemire, Dongdong Shan, Jian-Yun Nie, Hongfei Yan, Ji-Rong Wen
2015 ACM Transactions on Information Systems  
With competitive compression ratios and encoding speeds, our SIMD-based algorithms outperform state-of-the-art non-vectorized algorithms with respect to decoding speeds.  ...  Modern processors equipped with powerful SIMD instruction sets, provide us an opportunity for achieving better compression performance.  ...  of posting lists.  ... 
doi:10.1145/2735629 fatcat:2dophhsyobaqhn5xdwp5ldf66y

SIMD compression and the intersection of sorted integers

Daniel Lemire, Leonid Boytsov, Nathan Kurz
2015 Software, Practice & Experience  
To show that it does not have to be so, we (1) vectorize and optimize the intersection of posting lists; (2) introduce the SIMD Galloping algorithm.  ...  Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory.  ...  Morever, the excluded posting lists had averages of 3.2 postings for GOV2 and 4.5 postings for ClueWeb09.  ... 
doi:10.1002/spe.2326 fatcat:2wgknj4ysjeermg3pzitmrmtfi

Stream VByte : Faster byte-oriented integer compression

Daniel Lemire, Nathan Kurz, Christoph Rupp
2018 Information Processing Letters  
They are appealing due to their simplicity and engineering convenience. Amazon's varint-G8IU is one of the fastest byte-oriented compression technique published so far.  ...  It makes judicious use of the powerful single-instruction-multiple-data (SIMD) instructions available in commodity processors.  ...  Boystov from CMU for sharing the posting list collection.  ... 
doi:10.1016/j.ipl.2017.09.011 fatcat:l4n6dbgvafemljeaprrwmxwrku

In Vacuo and In Situ Evaluation of SIMD Codecs

Andrew Trotman, Jimmy Lin
2016 Proceedings of the 21st Australasian Document Computing Symposium on ZZZ - ADCS '16  
The size of a search engine index and the time to search are inextricably related through the compression codec.  ...  This investigation examines this tradeoff using several relatively unexplored SIMD-based codecs including QMX, TurboPackV, and TurboPFor. It uses (the non-SIMD) OPTPFor as a baseline.  ...  The typical approach is to compress each postings list independently.  ... 
doi:10.1145/3015022.3015023 fatcat:2syrr7rj4fghdh5c5tmxby7cry

SIMD-based decoding of posting lists

Alexander A. Stepanov, Anil R. Gangolli, Daniel E. Rose, Ryan J. Ernst, Paramjit S. Oberoi
2011 Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11  
Powerful SIMD instructions in modern processors offer an opportunity for greater search performance. In this paper, we apply these instructions to decoding search engine posting lists.  ...  We start by exploring variable-length integer encoding formats used to represent postings. We define two properties, byte-oriented and byte-preserving, that characterize many formats of interest.  ...  Acknowledgments We thank Thomas London for suggestions on a draft of this paper and Bill Stasior for supporting this research.  ... 
doi:10.1145/2063576.2063627 dblp:conf/cikm/StepanovGREO11 fatcat:x5fwdllgs5azxcvsdlgw4n6jy4

Decoding billions of integers per second through vectorization

D. Lemire, L. Boytsov
2013 Software, Practice & Experience  
For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during  ...  In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions.  ...  Kurz for his insights and for his review of the manuscript. We wish to thank the anonymous reviewers for their valuable comments.  ... 
doi:10.1002/spe.2203 fatcat:3whhroievffhzlos7ol454a4y4

Efficient Top-k Document Retrieval

Antonio Mallia
2019 BCS-IRSG Symposium on Future Directions in Information Access  
Finally, we briefly describe our initial findings and conclude by proposing future directions to follow.  ...  This research was supported by NSF Grant IIS-1718680 and a grant from Amazon.  ...  In particular, compression of posting lists is of utmost importance, since they account for much of the data size and access costs.  ... 
dblp:conf/fdia/Mallia19 fatcat:3hexl5ul4femre2lo3asa7qzv4

Vectorizing Database Column Scans with Complex Predicates

Thomas Willhalm, Ismail Oukid, Ingo Müller, Franz Faerber
2013 Very Large Data Bases Conference  
Compressing the underlying column data format is both an advantage and a challenge, because it reduces the data volume involved in a scan on one hand and introduces the need for decompression during the  ...  with In-List predicate, leading to an overall throughput of 8 billion rows per second and more on a single core.  ...  Acknowledgements We would like to thank Wolfgang Lehner for the valuable discussions and suggestions for the paper and the SAP HANA team for the support in developing and testing our implementation.  ... 
dblp:conf/vldb/WillhalmO0F13 fatcat:4awda4acsjeajdktmb7rsnzz2m

Fast integer compression using SIMD instructions

Benjamin Schlegel, Rainer Gemulla, Wolfgang Lehner
2010 Proceedings of the Sixth International Workshop on Data Management on New Hardware - DaMoN '10  
More specifically, we provide SIMD versions of both null suppression and Elias gamma encoding.  ...  We study algorithms for efficient compression and decompression of a sequence of integers on modern hardware.  ...  To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. tual computation.  ... 
doi:10.1145/1869389.1869394 dblp:conf/damon/SchlegelGL10 fatcat:yiqxth2o5jg5zadcqckzeuvlcm

Vectorized VByte Decoding [article]

Jeff Plaisance, Nathan Kurz, Daniel Lemire
2017 arXiv   pre-print
We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes.  ...  The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a continuation flag.  ...  Boystov from CMU for preparing and making available the posting list collection.  ... 
arXiv:1503.07387v4 fatcat:wzm5gytqe5c6xdwhwjfyqcki7m

VAST-Tree

Takeshi Yamamuro, Makoto Onizuka, Toshio Hitaka, Masashi Yamamuro
2012 Proceedings of the 15th International Conference on Extending Database Technology - EDBT '12  
Next, it applies the adaptive compression technique to save space and harness data parallelism with SIMD instructions to the middle and bottom layers of branch nodes.  ...  Moreover, a processor-friendly compression technique is applied to leaf nodes. The end result is that trees are much more compact and traversal eciency is high.  ...  Figure 6 : Compression of keys in leaf nodes. Upper array is a uncompressed list of keys, and lower array is a compressed one. VAST-Tree compresses k consecutive keys into a single chunk.  ... 
doi:10.1145/2247596.2247643 dblp:conf/edbt/YamamuroOHY12 fatcat:aowshe4vfvhrbae2sctirwbqqq

Scaling column imprints using advanced vectorization

Lefteris Sidirourgos, Hannes Mühleisen
2017 Proceedings of the 13th International Workshop on Data Management on New Hardware - DAMON '17  
Our ndings are very promising for both imprints and for future index design research that would employ advanced vectorization techniques and larger (up to 512-bit) and more (from 16 now to 32) SIMD registers  ...  We also experimentally explore the bene ts of stretching imprints to larger bit-vector sizes and blocks of data, using 256-bit SIMD registers.  ...  To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speci c permission and/or a fee. Request permissions from permissions@acm.org.  ... 
doi:10.1145/3076113.3076120 dblp:conf/damon/SidirourgosM17 fatcat:72tvqqn53zcjjbggmwh67k33ze

Evaluation of GPU/CPU co-processing models for JPEG 2000 packetization

Volker Bruns, Miguel A. Martinez-del-Amor, Heiko Sparenberg
2017 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)  
With the bottom-line goal of increasing the throughput of a GPU-accelerated JPEG 2000 encoder, this paper evaluates whether the post-compression rate control and packetization routines should be carried  ...  Experimental results for compressing a detail-rich UHD sequence to 4 bits/sample indicate speed-ups of 200x for the rate control and 100x for the packetization compared to the single-threaded implementation  ...  Only little focus, however, has been devoted to executing the Post-Compression Rate-Distortion Optimization (PCRD-Opt.) [4] and packetization on a GPU.  ... 
doi:10.1109/mmsp.2017.8122283 dblp:conf/mmsp/BrunsMS17 fatcat:tddjovrfsnfqlb4z5x2ryilqgu

Aird: A computation-oriented mass spectrometry data format enables higher compression ratio and less decoding time [article]

MiaoShan Lu, Shaowei An, Ruimin Wang, Jinyin Wang, Changbin Yu
2020 bioRxiv   pre-print
Unlike storage-oriented formats, which focusing more on lossless compression and compression rate, computation-oriented formats focus as much on decoding speed and disk read strategy as compression rate  ...  Here we describe "Aird", an opensource and computation-oriented format with controllable precision, flexible indexing strategies and high compression rate.  ...  To show that it does not have to be so, we (1) vectorize and optimize the intersection of posting lists; (2) introduce the SIMD GALLOPING algorithm.  ... 
doi:10.1101/2020.10.14.338921 fatcat:xekkgdjuljeczavm23hamqiddu
« Previous Showing results 1 — 15 out of 1,559 results