Filters








920 Hits in 5.8 sec

On the number of elements to reorder when updating a suffix array

M. Léonard, L. Mouchard, M. Salson
2012 Journal of Discrete Algorithms  
In this article we focus on the number of elements to be reordered for real-life texts.  ...  Recently new algorithms appeared for updating the Burrows-Wheeler Transform or the suffix array, when the text they index is modified.  ...  In this article, we focus on the number of elements that have to be reordered when one wants to update a Burrows-Wheeler Transform or a suffix array.  ... 
doi:10.1016/j.jda.2011.01.002 fatcat:222rtfwbznhlvhfvd2tbkyhdyy

A four-stage algorithm for updating a Burrows–Wheeler transform

M. Salson, T. Lecroq, M. Léonard, L. Mouchard
2009 Theoretical Computer Science  
Based on this algorithm, we also sketch a method for converting the suffix array of T into the suffix array of T .  ...  We present a four-stage algorithm that updates the Burrows-Wheeler Transform of a text T , when this text is modified.  ...  Rather than using such a conversion for the suffix array, it is possible to update it by using a method similar to our update algorithm.  ... 
doi:10.1016/j.tcs.2009.07.016 fatcat:7epl7gq3lzatnf75hc7jb47qai

Dynamic extended suffix arrays

M. Salson, T. Lecroq, M. Léonard, L. Mouchard
2010 Journal of Discrete Algorithms  
All these constructions are building the suffix array from the text, and any edit operation on the text leads to the construction of a brand new suffix array.  ...  We furthermore explain how this technique can be adapted for maintaining a sample of the Extended Suffix Array, containing a sample of the Suffix Array, a sample of the Inverse Suffix Array and the whole  ...  Acknowledgments The corresponding author would like to thank Gene Myers for being more than a constant source of inspiration.  ... 
doi:10.1016/j.jda.2009.02.007 fatcat:3rwijo4skraghjcclg3eyurkwq

AC-Suffix-Tree: Buffer Free String Matching on Out-of-Sequence Packets

Xinming Chen, Kailin Ge, Zhen Chen, Jun Li
2011 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems  
This novel algorithm associates the classical Aho-Corasick (AC) algorithm with a pattern suffix tree to search patterns with only the state numbers of AC automaton and suffix tree stored.  ...  However, buffering of out-of-sequence packets can become impractical on high speed links due to limited fast memory capacity, especially when the concurrent flows are in large quantity, or extremely disordered  ...  This paper uses the notations in the book Algorithms on Strings [5] : A string is a finite sequence of elements of a alphabet A, which is a finite nonempty set whose elements are called letters.  ... 
doi:10.1109/ancs.2011.14 dblp:conf/ancs/ChenGCL11 fatcat:itmkp6ds3fdt3g5yi4jg22vnum

Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array [article]

German Tischler
2016 arXiv   pre-print
It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an underlying suffix tree.  ...  We present an external memory algorithm for constructing the 2n bit version of the LCP array which uses O(n σ) bits of additional space in external memory when given a (compressed) BWT with alphabet size  ...  string s is an integer power of a shorter string s ′ s.t. s ′ is not itself an integer power of a shorter string, then the succinct permuted LCP array is constructed using s ′ .  ... 
arXiv:1601.05020v3 fatcat:7vrpmexn25gy3fpfd2xbyykhi4

A Fast Algorithm for the Largest Area First Parsing of Real Strings

Ivan Katanic, Strahil Ristov, Martin Rosenzwei
2020 IEEE Access  
This result is based on the fact that in the real data, the sum of all depths of an LCP-interval tree, over all of the positions in a suffix array of an input string, is only larger than the size of the  ...  The largest area first parsing of a string often leads to the best results in grammar compression for a variety of input data.  ...  ACKNOWLEDGMENT The author would like to thank the authors of [24] for using their suffix array construction code in our software and the anonymous reviewers of the previous versions of this article for  ... 
doi:10.1109/access.2020.3013676 fatcat:pq3cfo72zfhpridkdmu7ogf2o4

GoldenEye: stream-based network packet inspection using GPUs

Qian Gong, Wenji Wu, Phil DeMar Fermi
2018 2018 IEEE 43rd Conference on Local Computer Networks (LCN)  
When a batch of packets arrives, GoldenEye sorts packets into flow-reassembled streams and normalizes retransmission through a GPU-implemented reordering module.  ...  For signatures that straddle batch boundaries, GoldenEye couples a small set of metadata with a functionallyequivalent minimal regular expression retrieval algorithm to connect the partial matches.  ...  Since the performance of suffix-NFA traversals is proportional to the number of possible initial states, we break the regex of the form A.  ... 
doi:10.1109/lcn.2018.8638115 dblp:conf/lcn/GongWD18 fatcat:jz4dqt5apba4dhi257moe3czcu

A Practical and Scalable Tool to Find Overlaps between Sequences

Maan Haj Rachid, Qutaibah Malluhi
2015 BioMed Research International  
This paper presents a practical and easy-to-implement solution for one of these problems, namely, the all-pairs suffix-prefix problem, using a compact prefix tree.  ...  The evolution of the next generation sequencing technology increases the demand for efficient solutions, in terms of space and time, for several bioinformatics problems.  ...  Acknowledgments This paper was made possible by NPRP Grant no. 4-1454-1-233 from the Qatar National Research Fund (a member of Qatar Foundation).  ... 
doi:10.1155/2015/905261 pmid:25961045 pmcid:PMC4417569 fatcat:fo2iepszefbpnoe2cz3akhhgqy

From H&M to Gap for Lightweight BWT Merging [article]

Giovanni Manzini
2016 arXiv   pre-print
Recently, Holt and McMillan [Bionformatics 2014, ACM-BCB 2014] have proposed a simple and elegant algorithm to merge the Burrows-Wheeler transforms of a family of strings.  ...  In this paper we show that the H&M algorithm can be improved so that, in addition to merging the BWTs, it can also merge the Longest Common Prefix (LCP) arrays.  ...  When we reach an irrelevant block we use such pair to update k 0 and k 1 . The array F is not immediately updated: Instead we maintain two global arrays Proof.  ... 
arXiv:1609.04618v1 fatcat:5tnt3mjbube6ln3ieqdr4g3ruy

Efficient Construction of the BWT for Repetitive Text Using String Compression [article]

Diego Díaz-Domínguez, Gonzalo Navarro
2022 arXiv   pre-print
These results make our method stand out as the only one (to our knowledge) that can build the BCR BWT of a collection of 25 human genomes (75 GB) in about 7.3 hours, and using only 27 GB of working memory  ...  Concretely, we build on induced suffix sorting (ISS) and resort to run-length and grammar compression to maintain our intermediate results in compact form.  ...  It holds that O induces a partition over the suffix array of T i (SA i ) as the lexicographical sorting places the elements of each O u ∈ O in a consecutive range of SA i .  ... 
arXiv:2204.05969v1 fatcat:vsdsiyg5vbaxrk475ximlxcl2q

A Closer Look at Lightweight Graph Reordering

Priyank Faldu, Jeff Diamond, Boris Grot
2019 2019 IEEE International Symposium on Workload Characterization (IISWC)  
Our evaluation on 40 combinations of various graph applications and datasets shows that, compared to a baseline with no reordering, DBG yields an average application speed-up of 16.8% vs 11.6% for the  ...  A common property of graphs used in the domain of graph analytics is a power-law distribution of vertex connectivity, wherein a small number of vertices are responsible for a high fraction of all connections  ...  However, not all elements of the array exhibit high reuse. Elements associated with hot vertices are the ones responsible for large amount of reuse.  ... 
doi:10.1109/iiswc47752.2019.9041948 dblp:conf/iiswc/FalduDG19 fatcat:olh46r3z7jeszpfm2j4xqdz6fm

A Closer Look at Lightweight Graph Reordering [article]

Priyank Faldu and Jeff Diamond and Boris Grot
2020 arXiv   pre-print
A common property of graphs used in the domain of graph analytics is a power-law distribution of vertex connectivity, wherein a small number of vertices are responsible for a high fraction of all connections  ...  Our evaluation on 40 combinations of various graph applications and datasets shows that, compared to a baseline with no reordering, DBG yields an average application speed-up of 16.8% vs 11.6% for the  ...  However, not all elements of the array exhibit high reuse. Elements associated with hot vertices are the ones responsible for large amount of reuse.  ... 
arXiv:2001.08448v2 fatcat:74eha2ss5vhlhcbpg6pr7wsjje

ERA Revisited: Theoretical and Experimental Evaluation [article]

Matevž Jekovec, Andrej Brodnik
2016 arXiv   pre-print
Both theoretical computer scientists and engineers tackled the problem. In this paper we focus on the fastest practical suffix tree construction algorithm to date, ERA.  ...  We first provide a theoretical analysis of the algorithm assuming the uniformly random text as an input and using the PEM model of computation with respect to the lower bounds.  ...  do 14 Reorder the elements of Buf , P and SA in AA so that Buf is lexicographically sorted.  ... 
arXiv:1609.09654v1 fatcat:on6g7keizbeapclcq6tbb656p4

Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform

A. J. Cox, M. J. Bauer, T. Jakobi, G. Rosone
2012 Bioinformatics  
In previous work, we presented a novel algorithm that allows the BWT of human genome scale data to be computed on very moderate hardware, thus enabling us to investigate the BWT as a tool for the compression  ...  We demonstrate that compression may be greatly improved by a particular reordering of the sequences in the collection and give a novel 'implicit sorting' strategy that enables these benefits to be realised  ...  ACKNOWLEDGEMENT The authors would like to thank Dirk Evers for his support throughout this project.  ... 
doi:10.1093/bioinformatics/bts173 pmid:22556365 fatcat:fz2icliuazaifkskeyvvzgbmce

High-throughput sequence alignment using Graphics Processing Units

Michael C Schatz, Cole Trapnell, Arthur L Delcher, Amitabh Varshney
2007 BMC Bioinformatics  
MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree.  ...  By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the  ...  Acknowledgements The authors would like to thank David Luebke from nVidia Research for providing an early release of CUDA, Julian Parkhill from the Sanger Institute for making the S. suis data available  ... 
doi:10.1186/1471-2105-8-474 pmid:18070356 pmcid:PMC2222658 fatcat:hk457c5yvzekhcb7mwrvu7wmmq
« Previous Showing results 1 — 15 out of 920 results