5,972 Hits in 6.4 sec

mkESA: enhanced suffix array construction tool

R. Homann, D. Fleer, R. Giegerich, M. Rehmsmeier
2009 Bioinformatics  
in portable C99, based on a parallelized version of the Deep-Shallow suffix array construction algorithm, which is known for its high speed and small memory usage.  ...  We introduce the tool mkESA, an open source program for constructing enhanced suffix arrays (ESAs), striving for low memory consumption, yet high practical speed. mkESA is a userfriendly program written  ...  Linear-time algorithms for suffix array construction have been proposed as well as algorithms that are fast in practice and/or tuned for space efficiency, rendering use of suffix arrays feasible for large  ... 
doi:10.1093/bioinformatics/btp112 pmid:19246510 pmcid:PMC2666816 fatcat:zmtflcgj4bc67ioujyeoctg4ai

Scalable Construction of Text Indexes with Thrill

Timo Bingmann, Simon Gog, Florian Kurpicz
2018 2018 IEEE International Conference on Big Data (Big Data)  
In this article, we present five suffix array construction algorithms utilizing the new algorithmic big data batch processing framework Thrill, which allows scalable processing of input sizes on distributed  ...  With the rapid growth of available data, suffix array construction algorithms have to be adapted to advanced computational models such as external memory and distributed computing.  ...  Our research was supported by the Gottfried Wilhelm Leibniz Prize 2012, the German Research Foundation (DFG) SPP 1736 priority programme "Algorithms for Big Data", and the Large-Scale Data Management and  ... 
doi:10.1109/bigdata.2018.8622171 dblp:conf/bigdataconf/BingmannGK18 fatcat:7vpocnaabvdlloqyav3ib6zh6m

Fast BWT in small space by blockwise suffix sorting

Juha Kärkkäinen
2007 Theoretical Computer Science  
The algorithm is based on suffix arrays, but unlike any other algorithm, it can construct the suffix array a small block at a time without storing the rest of the suffix array anywhere.  ...  We present a new space-and time-efficient algorithm for computing the Burrow-Wheeler transform (BWT).  ...  Computing BWT from SA is simple and fast, and a lot of effort has been spent in developing fast and space-efficient algorithms for constructing the suffix array, i.e., for sorting the set of all suffixes  ... 
doi:10.1016/j.tcs.2007.07.018 fatcat:e44qqb5qpjc3bnm6h6ifpa5byu

Compressed Suffix Arrays for Massive Data [chapter]

Jouni Sirén
2009 Lecture Notes in Computer Science  
We present a fast space-efficient algorithm for constructing compressed suffix arrays (CSA).  ...  We show that the construction algorithm can be parallelized in a symmetric multiprocessor system, and discuss the possibility of a distributed implementation.  ...  The most promising algorithms are the distributed suffix array construction algorithm by Kulla et al.  ... 
doi:10.1007/978-3-642-03784-9_7 fatcat:bv2v3wzm3jgmdlowvexkurhcaa

An incomplex algorithm for fast suffix array construction

Klaus-Bernd Schürmann, Jens Stoye
2007 Software, Practice & Experience  
Our aim is to provide full text indexing data structures and algorithms for universal usage in text indexing. We present a practical algorithm for suffix array construction.  ...  We achieve very fast construction times for common strings as well as for worst case strings by enhancing our basic algorithms with further techniques.  ...  of the bpr algorithm.  ... 
doi:10.1002/spe.768 fatcat:vwfnt4ucdjdrveyqu7gb5hacmy

Efficient Discovery of Proximity Patterns with Suffix Arrays (Extended Abstract) [chapter]

Hiroki Arimura, Hiroki Asaka, Hiroshi Sakamoto, Setsuo Arikawa
2001 Lecture Notes in Computer Science  
With an index structure, called the virtual suffix tree, for pattern discovery built on the top of the suffix array, the resulting algorithm is simple and fast in practice compared with the previous implementation  ...  We describe an efficient implementation of a text mining algorithm for discovering a class of simple string patterns.  ...  Acknowledgments This work is partially supported by a Grant-in-Aid for Scientific Research on Priority Areas "Discovery Science" from the Ministry of Education, Science, Sports, and Culture in Japan.  ... 
doi:10.1007/3-540-48194-x_14 fatcat:6pwfl6eporehtekg5wdig5pria

Fast Lightweight Suffix Array Construction and Checking [chapter]

Stefan Burkhardt, Juha Kärkkäinen
2003 Lecture Notes in Computer Science  
Additionally, we describe fast and lightweight suffix array checkers, i.e., algorithms that check the correctness of a suffix array.  ...  We describe an algorithm that, for any v ∈ [2, n], constructs the suffix array of a string of length n in O(vn space in addition to the input (the string) and the output (the suffix array).  ...  Discussion We have described a suffix array construction algorithm that combines fast worst case running times with small space consumption.  ... 
doi:10.1007/3-540-44888-8_5 fatcat:gnic25jlz5hgzll3in4ez35weu

Simple Linear Work Suffix Array Construction [chapter]

Juha Kärkkäinen, Peter Sanders
2003 Lecture Notes in Computer Science  
A suffix array represents the suffixes of a string in sorted order.  ...  We introduce the skew algorithm for suffix array construction over integer alphabets that can be implemented to run in linear time using integer sorting as its only nontrivial subroutine: 1. recursively  ...  The result would be some kind of suffix list or padded suffix array that could be converted into a suffix array in logarithmic time. + L log 2 P + gn log n P log(n/P ) time general skewP = O n 1− processors  ... 
doi:10.1007/3-540-45061-0_73 fatcat:5l6a3kvv5fandemcha7yi4gj6e

A Big Data Approach for Sequences Indexing on the Cloud via Burrows Wheeler Transform [article]

Mario Randazzo, Simona E. Rombo
2020 arXiv   pre-print
Here we propose an algorithm for the computation of Burrows Wheeler transform relying on Big Data technologies, i.e., Apache Spark and Hadoop.  ...  Our approach is the first that distributes the index computation and not only the input dataset, allowing to fully benefit of the available cloud resources.  ...  ACKNOWLEDGEMENTS Part of the research presented here has been funded by the MIUR-PRIN research project "Multicriteria Data Structures and Algorithms: from compressed to learned indexes, and beyond", grant  ... 
arXiv:2007.10095v1 fatcat:vqd7mydjgvhspnfi4csvn572aq

Online Suffix Tree Construction for Streaming Sequences [chapter]

Giyasettin Ozcan, Adil Alpkocak
2008 Communications in Computer and Information Science  
In order to overcome these difficulties, first, we present a space efficient node representation approach to be used in Ukkonen suffix tree construction algorithm.  ...  In this study, we present an online suffix tree construction approach where multiple sequences are indexed by a single suffix tree.  ...  In these algorithms, suffix links functioned as shortcuts, which enable fast access to the suffix insertion positions of the tree.  ... 
doi:10.1007/978-3-540-89985-3_9 fatcat:orkbi5h6nrbufpv47tnox4noly

Implementing Suffix Array Algorithm Using Apache Big Table Data Implementation [article]

Piero Giacomelli
2020 arXiv   pre-print
In this paper we will describe a new approach on the well-known suffix-array algorithm using Big Table Data Technology.  ...  We will demonstrate how it is possible to refactor a well-known algorithm coupled by taking advantage of an high-performance distributed datastore, to illustrate the advantages of using datastore cloud  ...  SUFFIX-ARRAY ALGORITHM We are ready now to review how the suffix-array algorithm works.  ... 
arXiv:2003.11124v1 fatcat:oaeqkyvxn5awbpkrtr2nelmqg4

Prospects and limitations of full-text index structures in genome analysis

M. Vyverman, B. De Baets, V. Fack, P. Dawyndt
2012 Nucleic Acids Research  
These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures.  ...  Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs.  ...  The authors also like to acknowledge the members of the Nucleotides to Networks next-generation sequencing discussion group, in particular Yao-Cheng Lin and Lieven Sterck, for their helpful comments in  ... 
doi:10.1093/nar/gks408 pmid:22584621 pmcid:PMC3424560 fatcat:5sfziui7ujhfzcqhcukbi4utjq

Page 2569 of Mathematical Reviews Vol. , Issue 97D [page]

1997 Mathematical Reviews  
Summary: “Suffix trees and suffix arrays are data structures that allow fast search in a large static text.  ...  We propose a new data structure, the augmented suffix array, that allows searching in O(|w| + loglog(n) +k) time and requires about the same memory space as the suffix array.  ... 

SANS: high-throughput retrieval of protein sequences allowing 50% mismatches

J. P. Koskinen, L. Holm
2012 Bioinformatics  
Results: We present a novel word filter, suffix array neighborhood search (SANS), to identify protein sequence similarities in the range of 50-100% identity with sensitivity comparable to BLAST and 10  ...  In contrast to these previous approaches, the complexity of the search is proportional only to the length of the query sequence and independent of database size, enabling fast searching and functional  ...  ALGORITHM Database indexing A suffix array is an array of integers giving the starting positions of suffixes of a text in lexicographic order.  ... 
doi:10.1093/bioinformatics/bts417 pmid:22962464 pmcid:PMC3436844 fatcat:v2ivfdvfynf6fpl3ucxpo5o6xe

Fast parallel skew and prefix-doubling suffix array construction on the GPU

Leyuan Wang, Sean Baxter, John D. Owens
2016 Concurrency and Computation  
The straightforward way to generate a suffix array from a string is to simply sort all suffixes of that string using a comparison-based sorting algorithm.  ...  Their algorithm leverages two nice properties of the suffix array: (1) all the suffixes prefixed by a pattern P occupy a contiguous portion of the suffix array, denoted as a subarray; (2) the subarray  ... 
doi:10.1002/cpe.3867 fatcat:vfvcxuz2u5cybblaty5zt5fnhm
« Previous Showing results 1 — 15 out of 5,972 results