Filters








14,593 Hits in 4.2 sec

A tunable compression framework for bitmap indices

Gheorghi Guzun, Guadalupe Canahuate, David Chiu, Jason Sawin
2014 2014 IEEE 30th International Conference on Data Engineering  
Due to a trade-off between compression and query efficiency, bitmap compression schemes are aligned using a fixed encoding length size (typically the word length) to avoid explicit decompression during  ...  We present a framework that optimizes compression and query efficiency by allowing bitmaps to be compressed using variable encoding lengths while still maintaining alignment to avoid explicit decompression  ...  VARIABLE ALIGNED LENGTH (VAL) FRAMEWORK Most modern bitmap compression techniques are variants of the Word-Aligned Hybrid (WAH) encoding, which uses w-bit words to align with the underlying CPU architecture  ... 
doi:10.1109/icde.2014.6816675 dblp:conf/icde/GuzunCCS14 fatcat:3z5buqsppzcf7dxaaprv2nanai

Dynamic bitmap index recompression through workload-based optimizations

Fredton Doan, David Chiu, Brasil Perez Lukes, Jason Sawin, Gheorghi Guzun, Guadalupe Canahuate
2013 Proceedings of the 17th International Database Engineering & Applications Symposium on - IDEAS '13  
Previously, we introduced Variable Length Compression (VLC), which uses a general encoding that can achieve better compression than word-aligned schemes.  ...  Numerous hybrid run-length encoding compression schemes have been proposed that greatly compress the index and enable querying without the need to decompress.  ...  One popular hybrid RLE used for bitmap indices is Word-Aligned Hybrid (WAH) [9] .  ... 
doi:10.1145/2513591.2513641 dblp:conf/ideas/DoanCLSGC13 fatcat:plxmw7wf2zchjkeeqyfayz72zq

Variable Length Compression for Bitmap Indices [chapter]

Fabian Corrales, David Chiu, Jason Sawin
2011 Lecture Notes in Computer Science  
Today, two pervasive bitmap compression schemes employ a variation of run-length encoding, aligned over bytes (BBC) and words (WAH), respectively.  ...  However, these sorted bitmaps often display patterns of changing run-lengths that are not optimal for a byte nor a word alignment.  ...  The Byte-aligned Bitmap Compression (BBC) [2] and the Word Aligned Hybrid Code (WAH) [20] are commonly used to compress and query bitmaps in databases.  ... 
doi:10.1007/978-3-642-23091-2_32 fatcat:q7imcg6wuvblncruizq4zcm4de

Alternative Method of Audio Searching (AMAS)

Biprajeet Pal
2016 International Journal Of Engineering And Computer Science  
Compression: The Huffman algorithm replaces the input text strings into variable length codes in stage one.  ...  In stage two, the Adaptive Huffman algorithm replaces the fixed length codes into variable length codes. B.  ... 
doi:10.18535/ijecs/v5i11.33 fatcat:ic7mcm34cfgshpowhbraqiaqmi

USING ALIGNMENT FOR MULTILINGUAL TEXT COMPRESSION

EHUD S. CONLEY, SHMUEL T. KLEIN
2008 International Journal of Foundations of Computer Science  
Multilingual text compression exploits the existence of the same text in several languages to compress the second and subsequent copies by reference to the first.  ...  B(x) denotes the variable length binary encoding of x and I(x, y) denotes the variable length binary encoding of the index of x within the dictionary entry y; if y contains only one item, I(x, y) = ǫ.  ...  A S,T : A word-and phrase-level alignment of the text pair (S, T ). Let s i,l denote the word sequence of length l within S beginning at the ith word.  ... 
doi:10.1142/s0129054108005553 fatcat:jdraag27v5fgroltihmnvnsbma

Parallel acceleration of CPU and GPU range queries over large data sets

Mitchell Nelson, Zachary Sorenson, Joseph M. Myre, Jason Sawin, David Chiu
2020 Journal of Cloud Computing: Advances, Systems and Applications  
Bitmaps are usually highly compressible and can be queried directly using fast hardware-supported bitwise logical operations.  ...  One of the first hybrid run-length encoding schemes was Byte-aligned Bitmap Compression (BBC) [4] .  ...  [49] explored several similar column ordering techniques for the range query processing of Variable-Aligned Length [27] compressed bitmaps. Chmiel et al.  ... 
doi:10.1186/s13677-020-00191-w fatcat:z6jnxl2vhvam5lqjr4eyjswyla

Annotating Large Genomes With Exact Word Matches

J. Healy
2003 Genome Research  
Our original interest was motivated by oligonucleotide probe design, and we describe a general protocol for defining unique hybridization probes.  ...  Thus we can readily annotate any sequence, including the entire human genome, with the counts of its constituent words.  ...  Figure 1 1 Our algorithm for rapidly determining the exact word counts in a large string for any length word. (A) Graphically defines the variables and data structures used in the algorithm.  ... 
doi:10.1101/gr.1350803 pmid:12975312 pmcid:PMC403711 fatcat:ji7w5kiwa5hjfiyf6n746uke4e

Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units [article]

Zhangyu Xiao, Zhijian Ou, Wei Chu, Hui Lin
2018 arXiv   pre-print
The subword-based hybrid CTC-Attention system obtains 6.8% word error rate (WER) on the test_clean subset without any dictionary or external language model.  ...  The subword units are obtained by the byte-pair encoding (BPE) compression algorithm.  ...  Connectionist Temporal Classification (CTC) CTC provides a method to train RNNs without any prior alignment between input and output sequences of different lengths.  ... 
arXiv:1807.04978v2 fatcat:zhrsucbcxzennkr6kdd4vtqbxe

Concise: Compressed 'n' Composable Integer Set

Alessandro Colantonio, Roberto Di Pietro
2010 Information Processing Letters  
The Word Aligned Hybrid (WAH) bitmap compression trades some space to allow for bitwise operations without first decompressing bitmaps.  ...  However, bitmaps usually use a large storage space, thus requiring compression. Consequently, there is a space-time tradeoff among compression schemes.  ...  In this scenario, the Word Aligned Hybrid (WAH) bitmap compression algorithm is currently recognized as the most efficient one, mainly from a com- putational perspective [2] .  ... 
doi:10.1016/j.ipl.2010.05.018 fatcat:x5c2ucovizeqlhbhhl7v6rx3mu

Efficient genotype compression and analysis of large genetic-variation data sets

Ryan M Layer, Neil Kindlon, Konrad J Karczewski, Aaron R Quinlan
2015 Nature Methods  
GQT's compressed genotype index minimizes decompression for analysis, and performance relative to existing methods improves with cohort size.  ...  Instead we used the Word Aligned Hybrid (WAH) encoding strategy, which represents run length in words not in bits.  ...  Lastly, the bitmap indices are compressed using Word Aligned Hybrid (WAH) encoding 6 , which achieves near-optimal compression while still allowing bitwise operations directly on the compressed data  ... 
doi:10.1038/nmeth.3654 pmid:26550772 pmcid:PMC4697868 fatcat:enotd2owmfhuxh6najz55745qi

CONCISE: Compressed 'n' Composable Integer Set [article]

Alessandro Colantonio, Roberto Di Pietro
2010 arXiv   pre-print
The Word Aligned Hybrid (WAH) bitmap compression trades some space to allow for bitwise operations without first decompressing bitmaps.  ...  However, bitmaps usually use a large storage space, thus requiring compression. Nevertheless, there is a space-time tradeoff among compression schemes.  ...  In this scenario, the Word Aligned Hybrid (WAH) bitmap compression algorithm is currently recognized as the most efficient one, mainly from a computational perspective [2] .  ... 
arXiv:1004.0403v1 fatcat:hv6kgzkxx5c3lomtfl2gzfrnzy

SECOMPAX: A bitmap index compression algorithm

Yuhao Wen, Zhen Chen, Ge Ma, Junwei Cao, Wenxun Zheng, Guodong Peng, Shiwei Li, Wen-Liang Huang
2014 2014 23rd International Conference on Computer Communication and Networks (ICCCN)  
the state-of-art bitmap index compression algorithm WAH (Word-Aligned-Hybrid), PLWAH(Position list word aligned hybrid ) and COMPAX (COMPressed Adaptive indeX).  ...  In this paper, we propose a new bitmap index encoding algorithm named SECOMPAX (Scope-Extended COMPressed Adaptive indeX), which performs better compression ratio and fast encoding speed compared with  ...  length of WAH, COMPAX and SECOMPAX.  ... 
doi:10.1109/icccn.2014.6911838 dblp:conf/icccn/WenCMCZPLH14 fatcat:w4azkuejjvbwla7fg4sehgfmzm

New Trends of Digital Data Storage in DNA

Pavani Yashodha De Silva, Gamage Upeksha Ganegoda
2016 BioMed Research International  
using variable length codons [27] .  ...  Here Doig uses a variable codon length through employing more frequent amino acids with shorter codon length while rare acids are represented using a longer codon length.  ... 
doi:10.1155/2016/8072463 pmid:27689089 pmcid:PMC5027317 fatcat:6ymaftkcmbbwtkxwjzz5tlic74

Controllable Text Simplification with Explicit Paraphrasing [article]

Mounica Maddela, Fernando Alva-Manchego, Wei Xu
2021 arXiv   pre-print
However, such systems limit themselves to mostly deleting words and cannot easily adapt to the requirements of different target audiences.  ...  In this paper, we propose a novel hybrid approach that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting  ...  We also report FKGL (FK), average sentence length (SLen), output length (OLen), compression ratio (CR), self-BLEU (s-BL), percentage of sentence splits (%split), average percentage of new words added to  ... 
arXiv:2010.11004v3 fatcat:fxuw7kfrwvbave6wosdavj7c6a

Efficient genotype compression and analysis of large genetic variation datasets [article]

Ryan M Layer, Neil Kindlon, Konrad J Karczewski, Exome Aggregation Consortium ExAC, Aaron R Quinlan
2015 bioRxiv   pre-print
Speed improvements are achieved by operating directly on a compressed genotype index without decompression. GQT?  ...  s data compression ratios increase favorably with cohort size and relative analysis performance improves in kind.  ...  Using Word-Aligned Hybrid to directly query compressed data.  ... 
doi:10.1101/018259 fatcat:yhzkfsv3w5d2vpluloow6lh5wa
« Previous Showing results 1 — 15 out of 14,593 results