Filters








6,893 Hits in 5.4 sec

Linear-Time Text Compression by Longest-First Substitution

Ryosuke Nakamura, Shunsuke Inenaga, Hideo Bannai, Takashi Funamoto, Masayuki Takeda, Ayumi Shinohara
2009 Algorithms  
We consider grammar-based text compression with longest first substitution (LFS), where non-overlapping occurrences of a longest repeating factor of the input text are replaced by a new non-terminal symbol  ...  We also deal with a more sophisticated version of LFS, called LFS2, that allows better compression. The first linear-time algorithm for LFS2 is also presented.  ...  In this paper, we propose the first linear-time algorithm for text compression by LFS substitution. A key idea is the use of a new data structure called sparse lazy suffix trees.  ... 
doi:10.3390/a2041429 fatcat:deh7c4p6azdkrilktyo62zuay4

Linear-Time Off-Line Text Compression by Longest-First Substitution [chapter]

Shunsuke Inenaga, Takashi Funamoto, Masayuki Takeda, Ayumi Shinohara
2003 Lecture Notes in Computer Science  
In this paper, we present an algorithm that compresses a text basing on this longestfirst principle, in linear time.  ...  One representative tactics for off-line compression is to substitute the longest repeated factors of a text with a production rule.  ...  This paper, therefor, introduces the first explicit, and complete, linear-time algorithm for text compression with the longest-first substitution.  ... 
doi:10.1007/978-3-540-39984-1_11 fatcat:3n2yxmsaxfdkppwtojgjgkxifq

Simple Linear-Time Off-Line Text Compression by Longest-First Substitution

Ryosuke Nakamura, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda
2007 2007 Data Compression Conference (DCC'07)  
We consider grammar based text compression with longest first substitution, where non-overlapping occurrences of a longest repeating substring of the input text are replaced by a new non-terminal symbol  ...  We also present another type of longest first substitution strategy that allows better compression.  ...  Introduction In this paper we consider text compression by longest first substitution (named LFS).  ... 
doi:10.1109/dcc.2007.70 dblp:conf/dcc/NakamuraBIT07 fatcat:m55w6iz4qneslmrqdw7zrabque

Pattern-matching and text-compression algorithms

Maxime Crochemore, Thierry Lecroq
1996 ACM Computing Surveys  
The first linear-time string-matching algorithm was discovered by Morris and Pratt [1970] . It has been improved by Knuth et al. [1976] .  ...  TEXT COMPRESSION The following methods yield two basic data compression algorithms that produce good compression ratios and run in linear time.  ... 
doi:10.1145/234313.234331 fatcat:l35fbobetbdezblncuj2egwflu

Pattern Matching and Text Compression Algorithms [chapter]

Thierry Lecroq
2014 Computing Handbook, Third Edition  
The first linear-time string-matching algorithm was discovered by Morris and Pratt [1970] . It has been improved by Knuth et al. [1976] .  ...  TEXT COMPRESSION The following methods yield two basic data compression algorithms that produce good compression ratios and run in linear time.  ... 
doi:10.1201/b16812-18 fatcat:4hclurhe3rfjricyfpxe3bggvq

IN-PLACE UPDATE OF SUFFIX ARRAY WHILE RECODING WORDS

MATTHIAS GALLÉ, PIERRE PETERLONGO, FRANÇOIS COSTE
2009 International Journal of Foundations of Computer Science  
Motivated by grammatical inference and data compression applications, we propose an algorithm to update a suffix array while in the indexed text some occurrences of a given word are substituted by a new  ...  Experiments confirm a significant execution time speed-up compared to the construction of suffix array from scratch at each step of the application.  ...  Motivation In this paper, we propose an algorithm to efficiently update a suffix array, after substituting a word by a new character in the indexed text.  ... 
doi:10.1142/s0129054109007029 fatcat:jfzrj6to2rbetboyfzvgyley5y

40 years of suffix trees

Alberto Apostolico, Maxime Crochemore, Martin Farach-Colton, Zvi Galil, S. Muthukrishnan
2016 Communications of the ACM  
This paper reviews the first 40 years in the life of suffix trees, their many incarnations, and their applications.  ...  in a text in linear time.  ...  Knuth's conjecture, by showing how to find the longest substring common to two files in linear time for a finite alphabet.  ... 
doi:10.1145/2810036 fatcat:lmdh7fgxevcrrgo675vpt32bvy

Editorial

Raffaele Giancarlo, David Sankoff
2004 Journal of Discrete Algorithms  
This paper studies the way in which the adaptive techniques used in text compression can be applied to text mining.  ...  We conclude that compression forms a sound unifying principle that allows many text mining problems to be tacked adaptively.  ...  They reduce execution time to linear by incrementally updating digram counts as substitutions are made, and using a priority queue to keep track of the most common digrams.  ... 
doi:10.1016/j.jda.2004.04.010 fatcat:pvsv6os5erg3hf4s47bcvarara

Adaptive text mining: inferring structure from sequences

I.H. Witten
2004 Journal of Discrete Algorithms  
This paper studies the way in which the adaptive techniques used in text compression can be applied to text mining.  ...  We conclude that compression forms a sound unifying principle that allows many text mining problems to be tacked adaptively.  ...  They reduce execution time to linear by incrementally updating digram counts as substitutions are made, and using a priority queue to keep track of the most common digrams.  ... 
doi:10.1016/s1570-8667(03)00084-4 fatcat:qxwvoohn3vcx7mfisfymq562gi

Dictionary matching and indexing with errors and don't cares

Richard Cole, Lee-Ad Gottlieb, Moshe Lewenstein
2004 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing - STOC '04  
For example, for the indexing problem with n = |t| and m = |p|, the query time for k substitutions is O(m + (c 1 log n) k k! + # matches), with a data structure of size O(n (c 2 log n) k k!  ...  ) and a preprocessing time of O(n (c 2 log n) k k! ), where c1, c2 > 1 are constants.  ...  Suffix trees can be constructed in linear, O(n), time and space [17, 31, 37, 38] for linear-size alphabets.  ... 
doi:10.1145/1007352.1007374 dblp:conf/stoc/ColeGL04 fatcat:pjs3bjdz4bhynfhc2ommprmaxu

Compression with the tudocomp Framework [article]

Patrick Dinklage, Johannes Fischer, Dominik Köppl, Marvin Löbel, Kunihiko Sadakane
2017 arXiv   pre-print
We evaluate its features by a case study on two novel compression algorithms based on the Lempel-Ziv compression schemes that perform well on highly repetitive texts.  ...  We present a framework facilitating the implementation and comparison of text compression algorithms.  ...  We get linear running time with the same argument as for (1). Improved Compression Ratio.  ... 
arXiv:1702.07577v1 fatcat:2vfenbptbvacrd75ygqptaykpy

Speeding Up Pattern Matching by Text Compression [chapter]

Yusuke Shibata, Takuya Kida, Shuichi Fukamachi, Masayuki Takeda, Ayumi Shinohara, Takeshi Shinohara, Setsuo Arikawa
2000 Lecture Notes in Computer Science  
We compare running times to find a pattern in (1) BPE compressed files, (2) Lempel-Ziv-Welch compressed files, and (3) original text files, in various situations.  ...  Thus the BPE compression reduces not only the disk space but also the searching time.  ...  When the keys have overlaps, it replaces the longest possible first occurring key. The running time is linear in the total length of the original and the substituted text.  ... 
doi:10.1007/3-540-46521-9_25 fatcat:7vz3dg2c5jd5fhi77tx4cfdro4

Pattern Matching in Text Compressed by Using Antidictionaries [chapter]

Yusuke Shibata, Masayuki Takeda, Ayumi Shinohara, Setsuo Arikawa
1999 Lecture Notes in Computer Science  
In this paper we focus on the problem of compressed pattern matching for the text compression using antidictionaries, which is a new compression scheme proposed recently by .  ...  We show an algorithm which preprocesses a pattern of length m and an antidictionary M in O(m 2 + M ) time, and then scans a compressed text of length n in O(n + r) time to find all pattern occurrences,  ...  Since M is a part of the compressed representation of text, the text scanning time is O( M + n + r), which is linear in the compressed text length M + n, when ignoring r.  ... 
doi:10.1007/3-540-48452-3_3 fatcat:bs46l2mixrgv5fizzzxvudcpi4

The greedy approach to dictionary-based static text compression on a distributed system

Sergio De Agostino
2015 Journal of Discrete Algorithms  
The greedy approach to dictionary-based static text compression can be executed by a finite state machine.  ...  Beyond standard large scale, a negative effect on the compression effectiveness is caused by the very small size of the data blocks.  ...  Factors in the input string are substituted by pointers to dictionary copies and such pointers could be either variable or fixed length codewords.  ... 
doi:10.1016/j.jda.2015.05.001 fatcat:wmjjco7p2jh4nkomu6gl2c2y6q

Dictionary-Based Data Compression [chapter]

Travis Gagie, Giovanni Manzini
2016 Encyclopedia of Algorithms  
Here the task consists in performing string matching in a compressed text without decompressing it. For dictionary-based compressors this problem was first raised in 1994 by A. Amir, G.  ...  Algorithmic Issues One of the reasons for the popularity of dictionary-based compressors is that they admit linear-time, space-efficient implementations.  ...  The query time for suffix arrays is O(m +logn) achievable by embedding additional lcp (longest common prefix) information into the  ... 
doi:10.1007/978-1-4939-2864-4_108 fatcat:mmpcgzwxa5capiloipjqmxv2ea
« Previous Showing results 1 — 15 out of 6,893 results