Filters








2,345 Hits in 5.0 sec

FM-Indexing Grammars Induced by Suffix Sorting for Long Patterns [article]

Jin Jie Deng and Wing-Kai Hon and Dominik Köppl and Kunihiko Sadakane
2021 arXiv   pre-print
In this paper, we combine the virtues of a grammar with the RLBWT by building the RLBWT on top of a special grammar based on induced suffix sorting.  ...  Compared to grammar indexes, the size of the RLBWT is often much bigger, but queries like counting the occurrences of long patterns can be done much faster than on any existing grammar index so far.  ...  Acknowledgements This work was supported by JSPS KAKENHI grant numbers JP21K17701 and JP21H05847.  ... 
arXiv:2110.01181v1 fatcat:7dqamlf5jrgrro56vckacwnwki

Subject Index

2005 Journal of Discrete Algorithms  
single copy tandem dupli- cation trees, 362 Text databases Indexing text with approximate q-grams, 157 Time complexity Improving the algorithm of Bafna and Pevzner for the problem of sorting by transpositions  ...  , 431 Subtree isomorphism Constrained tree inclusion, 431 Suffix arrays Constructing suffix arrays in linear time, 126; Space efficient linear time construction of suf- fix arrays, 143 Suffix sorting  ... 
doi:10.1016/s1570-8667(05)00042-0 fatcat:phytlwupp5hmtddg3ffue5546u

Grammar Compression By Induced Suffix Sorting [article]

Daniel S. N. Nunes and Felipe A. Louza and Simon Gog and Mauricio Ayala-Rincón and Gonzalo Navarro
2020 arXiv   pre-print
A grammar compression algorithm, called GCIS, is introduced in this work. GCIS is based on the induced suffix sorting algorithm SAIS, presented by Nong et al. in 2009.  ...  The proposed solution builds on the factorization performed by SAIS during suffix sorting. A context-free grammar is used to replace factors by non-terminals.  ...  GCIS: GRAMMAR COMPRESSION BY INDUCED SUFFIX SORTING This section introduces the Grammar Compression algorithm by Induced Sorting (GCIS).  ... 
arXiv:2011.12898v1 fatcat:xnmwrqgfljg4ta6p7hgbpet63y

Efficient construction of the extended BWT from grammar-compressed DNA sequencing reads [article]

Diego Diaz-Dominguez annd Gonzalo Navarro
2021 arXiv   pre-print
Our technique exploits the string repetitions captured by the grammar to boost the computation of the eBWT.  ...  We rely on a new grammar recently proposed at DCC'21 whose nonterminals serve as building blocks for inducing the eBWT.  ...  Our experimental results showed that algorithms that work on top of grammars can be competitive in practice, and even more efficient.  ... 
arXiv:2102.03961v1 fatcat:i7fs6f6jlffzfn5cieqpye4dlu

The 'Specifier' in an HPSG grammar implementation of Norwegian

Lars Hellan, Dorothee Beermann
2005 Nordic Conference of Computational Linguistics  
background for the adoption of a category 'Specifier' in the analysis of noun phrases, and show how, under certain constraining assumptions, it can be successfully employed in the implementation of an HPSG grammar  ...  | sort | genbut the den-aloof here induces ...| cont | hook | index | sort | gen + Hellan & Beermann: The 'Specifier' in an HPSG grammar implementation of Norwegian ConclusionAlthough there  ...  The weak adjective suffix -e, as in den snille gutten and min snille gutt, induces the following specification on the adjective it gets suffixed to: ...  ... 
dblp:conf/nodalida/HellanB05 fatcat:etucmuocobbo5ockr257c72uay

Grammar-compressed Self-index with Lyndon Words [article]

Kazuya Tsuruta and Dominik Köppl and Yuto Nakashima and Shunsuke Inenaga and Hideo Bannai and Masayuki Takeda
2020 arXiv   pre-print
We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired by the Lyndon trees (Barcelo, 1990).  ...  Based on this SLP, we propose a self-index data structure of O(g) words of space that can be built from a string T in O(n n) expected time, retrieving the starting positions of all occurrences of a pattern  ...  the circumstances that T is represented by its Lyndon tree induced by the standard factorization, while P is represented by its Lyndon factors. .  ... 
arXiv:2004.05309v2 fatcat:zpzktt64nbh4nfkhn6nbv25xem

IN-PLACE UPDATE OF SUFFIX ARRAY WHILE RECODING WORDS

MATTHIAS GALLÉ, PIERRE PETERLONGO, FRANÇOIS COSTE
2009 International Journal of Foundations of Computer Science  
Motivated by grammatical inference and data compression applications, we propose an algorithm to update a suffix array while in the indexed text some occurrences of a given word are substituted by a new  ...  Compared to other published index update methods, the problem addressed here may require the modification of a large number of distinct positions over the original text.  ...  Fig. 2 . 2 In-Place Update of Suffix Array While Recoding Words 9 Moves induced by substituting GA by C 1 .  ... 
doi:10.1142/s0129054109007029 fatcat:jfzrj6to2rbetboyfzvgyley5y

Indexing Highly Repetitive String Collections [article]

Gonzalo Navarro
2022 arXiv   pre-print
Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities.  ...  The resulting indexes form a new generation of data structures able to handle the huge repetitive string collections that we are facing.  ...  By poly x we mean any polynomial in x, that is, x O(1) , and polylog x denotes poly (log x). Logarithms will be to the base 2 by default.  ... 
arXiv:2004.02781v9 fatcat:rceyc6ti5jdfpebkrbbgq3kine

Pattern Matching on Grammar-Compressed Strings in Linear Time [article]

Moses Ganardi, Paweł Gawrychowski
2021 arXiv   pre-print
We resolve this open question by presenting an O(n+m) time algorithm that, given a context-free grammar of size n that produces a single string t and a pattern p of length m, decides whether p occurs in  ...  Multiple versions of this basic question have been considered, and by now we know algorithms that are fast both in practice and in theory.  ...  that the queries are sorted by their weights.  ... 
arXiv:2111.05016v1 fatcat:etruqpzm3jf3zmv5uex555w5wu

Fast Label Extraction in the CDAWG [chapter]

Djamal Belazzougui, Fabio Cunial
2017 Lecture Notes in Computer Science  
All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.  ...  The compact directed acyclic word graph (CDAWG) of a string T of length n takes space proportional just to the number e of right extensions of the maximal repeats of T , and it is thus an appealing index  ...  This does not increase the size of the grammar asymptotically. Note that the subgraph induced by the new nonterminals in the modified grammar is the reverse of the compact suffix-link tree of T .  ... 
doi:10.1007/978-3-319-67428-5_14 fatcat:etggr4qzwbeqpeycmljdl6zqjq

Fast Label Extraction in the CDAWG [article]

Djamal Belazzougui, Fabio Cunial
2017 arXiv   pre-print
All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.  ...  between a query of length m and T, using an existing representation of the suffix tree based on the CDAWG.  ...  This does not increase the size of the grammar asymptotically. Note that the subgraph induced by the new nonterminals in the modified grammar is the reverse of the compact suffix-link tree of T .  ... 
arXiv:1707.08197v2 fatcat:dvm3vafqlra55mdmc25kf3ersq

Inducing Suffix and LCP Arrays in External Memory [chapter]

Timo Bingmann, Johannes Fischer, Vitaly Osipov
2013 2013 Proceedings of the Fifteenth Workshop on Algorithm Engineering and Experiments (ALENEX)  
We consider text index construction in external memory (EM). Our first contribution is an inducing algorithm for suffix arrays in external memory.  ...  Practical tests show that this outperforms the previous best EM suffix sorter [Dementiev et al., ALENEX 2005] by a factor of about two in time and I/O-volume.  ...  Sorting these tuples yields P , whose entries are lexicographically named in N (3) and sorted again by string index, resulting in R (4).  ... 
doi:10.1137/1.9781611972931.8 dblp:conf/alenex/BingmannFO13 fatcat:6qkkdn5mdjf2dpni7szxrnkh3i

Inducing Suffix and LCP Arrays in External Memory

Timo Bingmann, Johannes Fischer, Vitaly Osipov
2016 ACM Journal of Experimental Algorithmics  
We consider text index construction in external memory (EM). Our first contribution is an inducing algorithm for suffix arrays in external memory.  ...  Practical tests show that this outperforms the previous best EM suffix sorter [Dementiev et al., ALENEX 2005] by a factor of about two in time and I/O-volume.  ...  Sorting these tuples yields P , whose entries are lexicographically named in N (3) and sorted again by string index, resulting in R (4).  ... 
doi:10.1145/2975593 fatcat:6pc5dp5jd5fkvjes2anqpwkmfm

Document Retrieval on Repetitive String Collections [article]

Travis Gagie, Aleksi Hartikainen, Kalle Karhu, Juha Kärkkäinen, Gonzalo Navarro, Simon J. Puglisi, Jouni Sirén
2017 arXiv   pre-print
We study the problem of indexing repetitive string collections in order to perform efficient document retrieval operations on them.  ...  As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude.  ...  Finally, we thank the reviewers for their useful comments, which helped improve the presentation, and Meg Gagie for correcting our grammar.  ... 
arXiv:1605.09362v3 fatcat:goqxqemkdzfyrgffkbhr44nrja

Document retrieval on repetitive string collections

Travis Gagie, Aleksi Hartikainen, Kalle Karhu, Juha Kärkkäinen, Gonzalo Navarro, Simon J. Puglisi, Jouni Sirén
2017 Information retrieval (Boston)  
We study the problem of indexing repetitive string collections in order to perform efficient document retrieval operations on them.  ...  As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude.  ...  Grammar-based (Grammar) This index (Claude and Munro 2013) is an adaptation of a grammar-compressed self-index (Claude and Navarro 2012) to document listing.  ... 
doi:10.1007/s10791-017-9297-7 pmid:28596702 pmcid:PMC5445192 fatcat:uiju2twyyvetpgs355f5srn2ui
« Previous Showing results 1 — 15 out of 2,345 results