Filters








23 Hits in 1.6 sec

Fast Label Extraction in the CDAWG [article]

Djamal Belazzougui, Fabio Cunial
2017 arXiv   pre-print
All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.  ...  This implies a reduction from O(mn+occ) to O(m+occ) in the time needed to locate all the occ occurrences of the pattern.  ...  Acknowledgements We thank the anonymous reviewers for simplifying some parts of the paper, for improving its overall clarity, and for suggesting references [11, 12, 14] and the current version of Lemma  ... 
arXiv:1707.08197v2 fatcat:dvm3vafqlra55mdmc25kf3ersq

Fast Label Extraction in the CDAWG [chapter]

Djamal Belazzougui, Fabio Cunial
2017 Lecture Notes in Computer Science  
All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.  ...  of the CDAWG.  ...  Acknowledgements We thank the anonymous reviewers for simplifying some parts of the paper, for improving its overall clarity, and for suggesting references [11, 12, 14] and the current version of Lemma  ... 
doi:10.1007/978-3-319-67428-5_14 fatcat:etggr4qzwbeqpeycmljdl6zqjq

Full-Text Search Using Double-Array CDAWG

Yuma Fujita, Yoshiaki Ichihashi, Shunsuke Kanda, Kazuhiro Morita, Masao Fuketa
2016 International Journal of Future Computer and Communication  
While the method can conduct fast retrieval, there are not application examples for Full-Text Search.  ...  Therefore, we propose a new method using Double-Array CDAWG for high speed Full-Text Search. Experimental results show the effectiveness of the proposed method.  ...  We propose a new method that is a suffix CDAWG using Double-Array for high speed Full-Text Search. In some cases, an arc label is string in a CDAWG.  ... 
doi:10.18178/ijfcc.2016.5.6.478 fatcat:zawtfrs47jh7nfk3ngq3vpsoum

Practical combinations of repetition-aware data structures [article]

Djamal Belazzougui, Fabio Cunial, Travis Gagie, Nicola Prezza, Mathieu Raffinot
2016 arXiv   pre-print
Such variants use, respectively, the RLBWT of a string and the RLBWT of its reverse, or just one RLBWT inside a bidirectional index, or just one RLBWT with support for unidirectional extraction.  ...  The main ingredient of our structures is the run-length encoded BWT (RLBWT), which takes space proportional to the number of runs in the Burrows-Wheeler transform of a string.  ...  We thank Miguel Ángel Martínez for providing implementations of the variants described in [17] , and Daniel Valenzuela for providing the implementation described in [32] .  ... 
arXiv:1604.06002v2 fatcat:ic3vcbw5fjhvzc6poknt3rweue

Flexible Indexing of Repetitive Collections [chapter]

Djamal Belazzougui, Fabio Cunial, Travis Gagie, Nicola Prezza, Mathieu Raffinot
2017 Lecture Notes in Computer Science  
We describe practical data structures that support counting and locating all the exact occurrences of a pattern in a repetitive text, by combining the run-length encoded Burrows-Wheeler transform (RLBWT  ...  ) with the boundaries of Lempel-Ziv 77 factors.  ...  We denote by (γ), or equivalently by (u, v), the label of edge γ = (u, v) ∈ E, and we denote by (v) the concatenation of all edge labels in the path from the root to node v ∈ V .  ... 
doi:10.1007/978-3-319-58741-7_17 fatcat:aasw7zjzhfgc5jweiku6wtjugu

String Attractors [article]

Nicola Prezza
2017 arXiv   pre-print
To conclude, we consider generalizations of string attractors to labeled graphs, show that the attractor problem is NP-complete on trees, and provide a logarithmic approximation computable in polynomial  ...  We then apply string attractors to solve efficiently a fundamental problem in the field of compressed computation: we present a universal compressed data structure for text extraction that improves existing  ...  Acknowledgements I would like to thank Alberto Policriti for many fruitful discussions on the topic.  ... 
arXiv:1709.05314v2 fatcat:ipwiyzukeffttfkgyj4fkugjyu

Reducing the space requirement of suffix trees

Stefan Kurtz
1999 Software, Practice & Experience  
The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average for a collection of 42 files of different type.  ...  Our representations can be constructed without extra space, and as fast as previous representations. The asymptotic running times of suffix tree applications are retained.  ...  ACKNOWLEDGEMENTS The author is partially supported by DFG-grant Ku 1257/1-1. Bernhard Balkenhol suggested to further improve preliminary techniques to reduce the space requirement of suffix trees.  ... 
doi:10.1002/(sici)1097-024x(199911)29:13<1149::aid-spe274>3.0.co;2-o fatcat:vlliuhyiqfgwpe2wpq7uvlfeg4

Improved Tweets Polarity Detection using Lexicon-based Features and Caching

2019 VOLUME-8 ISSUE-10, AUGUST 2019, REGULAR ISSUE  
The low dimension data improves the classification efficiency. The experiment shows that the method is improving the overall performance in training and testing of polarity detection  ...  The pre-processing is improved using proper caching of data items to save the time for processing of duplicate items in data sets. The feature selection strategy ensures reduced dimensionality.  ...  For all the tweets in the datasets, the extracted feature vectors are combined with the annotated labels to make input for supervised learning classifiers.  ... 
doi:10.35940/ijitee.a5068.129219 fatcat:fwug7xkyivhjzfbpzq3eazgqwe

Computation over Compressed Structured Data (Dagstuhl Seminar 16431)

Philip Bille, Markus Lohrey, Sebastian Maneth, Gonzalo Navarro, Marc Herbstritt
2017 Dagstuhl Reports  
This report documents the program and the outcomes of Dagstuhl Seminar 16431 "Computation over Compressed Structured Data".  ...  better compression performance than the one obtained with DSM alone, and can offer reasonable edge extraction time.  ...  We plan to combine the recently proposed GLOUDS representation [1] with DSM, a technique used to compress Web and social graphs by exploiting the presence of bicliques and dense subgraphs [2].  ... 
doi:10.4230/dagrep.6.10.99 dblp:journals/dagstuhl-reports/BilleLMN16 fatcat:jel4wyc2gje6thmu5zj7aryofu

Indexing Highly Repetitive String Collections [article]

Gonzalo Navarro
2022 arXiv   pre-print
We conclude with the current challenges in this fascinating field.  ...  In this survey we cover the algorithmic developments that have led to these data structures.  ...  We will use big-O notation for the time complexities, and in many cases for the space complexities as well.  ... 
arXiv:2004.02781v9 fatcat:rceyc6ti5jdfpebkrbbgq3kine

At the roots of dictionary compression: string attractors

Dominik Kempa, Nicola Prezza
2018 Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing - STOC 2018  
This, in particular, includes the full string attractor problem.  ...  A well-known fact in the field of lossless text compression is that high-order entropy is a weak model when the input contains long repetitions.  ...  Ukkonen for the great feedback.  ... 
doi:10.1145/3188745.3188814 dblp:conf/stoc/KempaP18 fatcat:pr67vs6oynhkdlcpbqcwrb5qmu

At the Roots of Dictionary Compression: String Attractors [article]

Dominik Kempa, Nicola Prezza
2018 arXiv   pre-print
A well-known fact in the field of lossless text compression is that high-order entropy is a weak model when the input contains long repetitions.  ...  In particular, our solution matches the lower bound also on LZ77, straight-line programs, collage systems, and macro schemes, and therefore essentially closes (at once) the random access problem for all  ...  Ukkonen for the great feedback.  ... 
arXiv:1710.10964v3 fatcat:gqdet7syxretddidl42oa5hbda

MergedTrie: Efficient textual indexing

Antonio Ferrández, Jesús Peral, Balaraman Ravindran
2019 PLoS ONE  
), especially when working in the fields of Big Data or IoT, which require the handling of very large string dictionaries.  ...  It improves the DT compression by merging both Tries into a single and by segmenting the indexed term into two fixed length parts in order to balance the new Trie.  ...  as occurs when compacted or minimized segments need to be split or merged in the DAWG, ADFA and CDAWG structures. 3.  ... 
doi:10.1371/journal.pone.0215288 pmid:31013282 pmcid:PMC6478299 fatcat:3v3ayp4l6jaaxiigj6jdg43nha

Optimal-Time Text Indexing in BWT-runs Bounded Space [article]

Travis Gagie, Gonzalo Navarro, Nicola Prezza
2017 arXiv   pre-print
We also describe a structure using O(r(n/r)) space that replaces the text and extracts any text substring of length ℓ in almost-optimal time O((n/r)+ℓ(σ)/w). (...continues...)  ...  One of the earliest indexes for repetitive collections, the Run-Length FM-index, used O(r) space and was able to efficiently count the number of occurrences of a pattern of length m in the text (in loglogarithmic  ...  After finding the candidate node on the z-fast trie, we verify it in O(m log(σ)/w) time by extracting a substring from V . We augment each trie node as done in Theorem 3 with triples occ, i, δ .  ... 
arXiv:1705.10382v4 fatcat:hfrc7jgbffdotaiha672zfl5pe

Sensitivity of string compressors and repetitiveness measures [article]

Tooru Akagi, Mitsuru Funakoshi, Shunsuke Inenaga
2022 arXiv   pre-print
This notion enables one to measure the robustness of compression algorithms in terms of errors and/or dynamic changes occurring in the input string.  ...  We also study the worst-case sensitivity of several grammar compression algorithms including Bisection, AVL-grammar, GCIS, and CDAWG.  ...  The authors thank anonymous referees for pointing out some errors in the earlier version of this work and for their suggestions to improve the paper.  ... 
arXiv:2107.08615v3 fatcat:ichv3ny2qrgdbpexulv6ne3q3e
« Previous Showing results 1 — 15 out of 23 results