74 Hits in 4.6 sec

Algorithmics on SLP-compressed strings: A survey

Markus Lohrey
2012 Groups - Complexity - Cryptology  
Among others, we study pattern matching for compressed strings, membership problems for compressed strings in various kinds of formal languages, and the problem of querying compressed strings.  ...  Results on algorithmic problems on strings that are given in a compressed form via straightline programs are surveyed.  ...  Using the fact that the number of factors in the LZ77-factorization of an SLP-compressed word eval(A) is bounded by the size of the SLP A (Theorem 6) as well as the results from Section 6 on compressed  ... 
doi:10.1515/gcc-2012-0016 fatcat:o7lrrx3cgvhqrhmf4bsulc7rsa

Engineering Relative Compression of Genomes [article]

Szymon Grabowski, Sebastian Deorowicz
2011 arXiv   pre-print
One of the new successful ideas is augmenting the reference sequence with phrases from the other sequences, making more LZ-matches available.  ...  In this paper we present an LZ77-style compression scheme for relative compression of multiple genomes of the same species.  ...  Acknowledgments We thank Shanika Kuruppu and Simon Puglisi for providing us with their software and tips on how to use it.  ... 
arXiv:1103.2351v1 fatcat:j2paawjhtngzxkl3hdjicfnnfy

Learning and storing the parts of objects: IMF

Ruairi de Frein
2014 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)  
2) devising a coding scheme which compressed the factorization, using a scheme based on [2] , resulting in an integrated factorization-compression system.  ...  INTRODUCTION Nonnegative Matrix Factorization (NMF) uncovers feature vectors and vectors of activations of these features in a signal ensemble with structural regularity via non-negativity constraints  ... 
doi:10.1109/mlsp.2014.6958926 dblp:conf/mlsp/Frein14 fatcat:i22ekoaizfdv5ktgxgdnr27atq

Entropy Lower Bounds for Dictionary Compression

Michal Ganczorz, Michael Wagner
2019 Annual Symposium on Combinatorial Pattern Matching  
We show that a wide class of dictionary compression methods (including LZ77, LZ78, grammar compressors as well as parsing-based structures) require |S|H k (S) + Ω (|S|k log σ/ log σ |S|) bits to encode  ...  on k and σ but not on |S|.  ...  As a result, there are many both simple and efficient methods that are commonly used in practice, those include algorithms from Lempel-Ziv family: LZ77 [34] and LZ78 [35] , grammar compressors: Re-Pair  ... 
doi:10.4230/lipics.cpm.2019.11 dblp:conf/cpm/Ganczorz19 fatcat:4juqwnmmnze6tgvjvptwuq6uie

Information density, structure and entropy in equilibrium and non-equilibrium systems [article]

Mengjie Zu, Arunkumar Bupathy, Daan Frenkel, Srikanth Sastry
2019 arXiv   pre-print
Our entropy estimates based on bit-wise data compression contain no adjustable scaling factor, and show large quantitative differences with the thermodynamic entropy obtained from equilibrium simulations  ...  However, we cannot rely on the statistical-mechanical expressions for entropy in systems that are far from equilibrium.  ...  Compress the binary strings using the LZ77 algorithm to estimate the compression density I j of j th bit.  ... 
arXiv:1912.03876v1 fatcat:vishihynurf7bjh3djyjdgmipq


Hao Jiang, Chunwei Liu, Qi Jin, John Paparrizos, Aaron J. Elmore
2020 Proceedings of the VLDB Endowment  
Using an unsupervised approach, PIDS identifies common patterns in string attributes from relational databases, and uses the discovered pattern to split each attribute into sub-attributes.  ...  We propose PIDS, Pattern Inference Decomposed Storage, an innovative storage method for decomposing string attributes in columnar stores.  ...  In Figure 14 , we show a microbenchmark comparing PIDS's sub-attribute extraction algorithm with the widely used regular expression-based algorithm and a state machine-based algorithm based on recent  ... 
doi:10.14778/3380750.3380761 fatcat:uct7fu53hfdppkuhsj7p5pr4ka

A Fully Reversible Data Transform Technique Enhancing Data Compression of SMILES Data [chapter]

Shagufta Scanlon, Mick Ridley
2013 Lecture Notes in Computer Science  
The requirement to efficiently store and process SMILES data used in Chemoinformatics creates a demand for efficient techniques to compress this data.  ...  We develop a transform specific to SMILES data that can be used alongside other general-purpose compressors as a preprocessor and post-processor to improve the compression of SMILES data.  ...  We would like to thank all the authors of publicly available datasets, compression and transformation tools that made this study possible.  ... 
doi:10.1007/978-3-642-40511-2_5 fatcat:glf2ixnbqvf4bga524uanhepq4

Effective asymmetric XML compression

Przemysław Skibiński, Szymon Grabowski, Jakub Swacha
2008 Software, Practice & Experience  
In this work, we describe a fast and fully reversible XML transform which, combined with generally used LZ77-style compression algorithms, allows to attain high compression ratios, comparable to those  ...  The test results show the proposed transform to improve the XML compression efficiency of general purpose compressors on average by 35% in case of gzip and 17% in case of LZMA.  ...  Gzip is based on Deflate, the most widely-used compression algorithm, known for its fast compression and very fast decompression, but limited efficiency.  ... 
doi:10.1002/spe.859 fatcat:rv5ca26rg5fvpprjkxkawqxgme

Effective Construction of Relative Lempel-Ziv Dictionaries

Kewen Liao, Matthias Petri, Alistair Moffat, Anthony Wirth
2016 Proceedings of the 25th International Conference on World Wide Web - WWW '16  
To avoid the complications of string covering algorithms on large collections, we focus on k-mers and their frequencies.  ...  retain only the high-use sections.  ...  This work was funded by the Australian Research Council's Discovery Project scheme (DP140103256), and by the Victorian Life Sciences Computation Initiative (VR0280), on its Peak Computing Facility at the  ... 
doi:10.1145/2872427.2883042 dblp:conf/www/LiaoPMW16 fatcat:vb2snpwtfzbaln2gpo6e54utai

Detecting visually similar Web pages

Teh-Chung Chen, Scott Dick, James Miller
2010 ACM Transactions on Internet Technology  
The finding/matching strategy is the same as the sliding window technique used in LZ77.  ...  It is computed from the lengths of compressed data files, images, strings, etc. using real-world compression algorithms.  ...  We believe by combining other reliable Phishing website features, we can develop a hybrid heuristic Anti-Phishing mechanism based on our NCD similarity clustering core technology with accurate Phishing  ... 
doi:10.1145/1754393.1754394 fatcat:3fsxye6xzrgvnd76ria3lozdze

Analysis and study on text representation to improve the accuracy of the Normalized Compression Distance [article]

Ana Granados
2012 arXiv   pre-print
This thesis focuses on dealing with texts using compression distances.  ...  Broadly speaking, the way in which this is done is exploring the effects that several distortion techniques have on one of the most successful distances in the family of compression distances, the Normalized  ...  Finally, the shortest program that can generate the third string should simply print all the bits of the sequence, because this string cannot be expressed in any regular way.  ... 
arXiv:1205.6376v1 fatcat:stcfnttgxvfqho47tk57bdcu7q

Optimizing LZSS compression on GPGPUs

Adnan Ozsoy, Martin Swany, Arun Chauhan
2014 Future generations computer systems  
We also benchmarked our algorithm in comparison with well known, widely used programs: GZIP and ZLIB.  ...  The two main stages of the algorithm, substring matching and encoding, are studied in detail to fit into the GPU architecture.  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.  ... 
doi:10.1016/j.future.2013.06.022 fatcat:64272o7xgjfc3leprc5ftziu7m

Assessing linguistic complexity [chapter]

Patrick Juola
2008 Studies in Language Companion Series  
I focus not only on the mathematical aspects of complexitgy, but on the psychological ones.  ...  By comparing different measures, one may better understand on human language processing and similarly, understanding psycholinguistics may drive better measures.  ...  Further compression analyses : Translation The previous section discussed three positive factors in linguistic complexity in conjunction with an apparent negative aspect, "the complexity/information omitted  ... 
doi:10.1075/slcs.94.07juo fatcat:3joa2d7d5zhwrhwbechimgg3ga

Word Problems and Membership Problems on Compressed Words

Markus Lohrey
2006 SIAM journal on computing (Print)  
Words are compressed using straight-line programs, i.e., context-free grammars that generate exactly one word.  ...  As a by-product of our results on compressed word problems we obtain a fixed deterministic context-free language with a PSPACE-complete compressed membership problem.  ...  Let us mention here the work on compressed pattern matching, see, e.g., [19, 23, 49, 60] .  ... 
doi:10.1137/s0097539704445950 fatcat:z7q44rbobfhyjoaqfoavl6al3a


Raffaele Giancarlo, David Sankoff
2004 Journal of Discrete Algorithms  
This paper studies the way in which the adaptive techniques used in text compression can be applied to text mining.  ...  We conclude that compression forms a sound unifying principle that allows many text mining problems to be tacked adaptively.  ...  Our criterion is whether a candidate acronym could be coded more efficiently using a special model than it is using a regular text compression scheme.  ... 
doi:10.1016/j.jda.2004.04.010 fatcat:pvsv6os5erg3hf4s47bcvarara
« Previous Showing results 1 — 15 out of 74 results