Filters








383 Hits in 7.7 sec

Advantages of Backward Searching — Efficient Secondary Memory and Distributed Implementation of Compressed Suffix Arrays [chapter]

Veli Mäkinen, Gonzalo Navarro, Kunihiko Sadakane
2004 Lecture Notes in Computer Science  
Furthermore, the regular access pattern of backward searching permits an efficient secondary memory implementation, so that the search can be done with O(m log B n) disk accesses, being B the disk block  ...  One of the most relevant succinct suffix array proposals in the literature is the Compressed Suffix Array (CSA) of Sadakane [ISAAC 2000].  ...  Conclusions We have proposed a new implementation of the backward search algorithm for the Compressed Suffix Array (CSA).  ... 
doi:10.1007/978-3-540-30551-4_59 fatcat:exa2icv2prdljl7pphs54pzgge

Prospects and limitations of full-text index structures in genome analysis

M. Vyverman, B. De Baets, V. Fack, P. Dawyndt
2012 Nucleic Acids Research  
Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs.  ...  Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems.  ...  In addition, the authors would also like to thank the editor and the anonymous reviewers for their valuable comments and suggestions to improve the quality of the manuscript.  ... 
doi:10.1093/nar/gks408 pmid:22584621 pmcid:PMC3424560 fatcat:5sfziui7ujhfzcqhcukbi4utjq

Inexact Sequence Mapping Study Cases: Hybrid GPU Computing and Memory Demanding Indexes

José Salavert Torres, Andrés Tomás, Ignacio Medina, Ignacio Blanquer
2014 International Work-Conference on Bioinformatics and Biomedical Engineering  
In this paper we discuss the implementation of backward search methods for inexact mapping in these two study cases.  ...  On the other hand, out-of-core implementations allow to directly access data from secondary memory, which may be useful when mapping against big indexes in systems with low memory configurations.  ...  Nowadays, several mapping techniques are based in backward search methods over Suffix arrays [16] (SA).  ... 
dblp:conf/iwbbio/TorresTMB14 fatcat:emlwq5aysreobn52mqfd7lo3dq

Large-Scale Pattern Search Using Reduced-Space On-Disk Suffix Arrays [article]

Simon Gog, Alistair Moffat, J. Shane Culpepper, Andrew Turpin, and Anthony Wirth
2013 arXiv   pre-print
The suffix array is an efficient data structure for in-memory pattern search.  ...  Suffix arrays can also be used for external-memory pattern search, via two-level structures that use an internal index to identify the correct block of suffix pointers.  ...  Abstract-The suffix array is an efficient data structure for in-memory pattern search.  ... 
arXiv:1303.6481v1 fatcat:soh5dytslfezjffzutf6lwfxbi

Large-Scale Pattern Search Using Reduced-Space On-Disk Suffix Arrays

Simon Gog, Alistair Moffat, J. Shane Culpepper, Andrew Turpin, Anthony Wirth
2014 IEEE Transactions on Knowledge and Data Engineering  
The suffix array is an efficient data structure for in-memory pattern search.  ...  Suffix arrays can also be used for external-memory pattern search, via two-level structures that use an internal index to identify the correct block of suffix pointers.  ...  For static data the SB-TREE can be implemented as a uniform partitioning of a suffix array, with an in-memory suffix tree index implemented as a blind tree or bit-blind tree.  ... 
doi:10.1109/tkde.2013.129 fatcat:237566jg3zfatfep2hjilvgj4m

Compression, Indexing, and Retrieval for Massive String Data [chapter]

Wing-Kai Hon, Rahul Shah, Jeffrey Scott Vitter
2010 Lecture Notes in Computer Science  
Several challenges remain, and we focus in this presentation on two in particular: building I/O-efficient search structures when the input data are so massive that external memory must be used, and incorporating  ...  The field of compressed data structures seeks to achieve fast search time, but using a compressed representation, ideally requiring less space than that occupied by the original input data.  ...  However, we are increasingly having to deal with massive data sets that do not easily fit into internal memory and thus must be stored on secondary storage, such as disk drives, or in a distributed fashion  ... 
doi:10.1007/978-3-642-13509-5_24 fatcat:yai4yylbdfhqxm7n65m7vw4lli

Accelerating Maximal-Exact-Match Seeding with Enumerated Radix Trees [article]

Arun Subramaniyan, Jack Wadden, Kush Goliya, Nathan Ozog, Xiao Wu, Satish Narayanasamy, David Blaauw, Reetuparna Das
2020 bioRxiv   pre-print
Furthermore, we prototype an FPGA implementation of ERT on Amazon EC2 F1 cloud and observe 1.6× higher seeding throughput over a 48-thread optimized CPU-ERT implementation.Availability and implementationhttps  ...  This is because both BWA-MEM and BWA-MEM2 use a compressed index structure called the FMD-Index, which results in high memory bandwidth requirements for seeding, primarily due to its character-by-character  ...  The locations of these SMEMs in the reference genome (hits) are then determined by a suffix array lookup (can take multiple occurrence table lookups in BWA-MEM, since the suffix array is sampled) and passed  ... 
doi:10.1101/2020.03.23.003897 fatcat:ih3uy7tabjc7zjr774uqez2nva

SeqAn An efficient, generic C++ library for sequence analysis

Andreas Döring, David Weese, Tobias Rausch, Knut Reinert
2008 BMC Bioinformatics  
The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.  ...  Results: To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology.  ...  Acknowledgements We would like to acknowledge all students of the BSc and MSc program in bioinformatics at the FU Berlin who have contributed to SeqAn so far.  ... 
doi:10.1186/1471-2105-9-11 pmid:18184432 pmcid:PMC2246154 fatcat:mkmgd3gkzjavvmoh3tgrjmny24

Distributed text search using suffix arrays

Diego Arroyuelo, Carolina Bonacic, Veronica Gil-Costa, Mauricio Marin, Gonzalo Navarro
2014 Parallel Computing  
We introduce techniques for deploying suffix arrays on clusters of distributed-memory processors and then study the processing of multiple queries on the distributed data structure.  ...  Even though the cost of individual search operations in sequential (non-distributed) suffix arrays is low in practice, the problem of processing multiple queries on distributed-memory systems, so that  ...  Acknowledgments The experimental part of this work has been supported by a HPC-EUROPA2 project (code 228398) with the support of the European Commission -Capacities Area -Research Infrastructures.  ... 
doi:10.1016/j.parco.2014.06.007 fatcat:tnsgzdptqjc63akhihkfw5j2pe

ER-index: a referential index for encrypted genomic databases [article]

Ferdinando Montecuollo, Giovannni Schmid
2019 arXiv   pre-print
We designed and implemented ER-index, a new full-text index in minute space which was optimized for compressing and encrypting collections of genomic sequences, and for performing on them fast pattern-search  ...  Our new index complements the E2FM-index, which was introduced to compress and encrypt collections of nucleotide sequences without relying on a reference sequence.  ...  This backward search gives as result the R rev suffix array range containing the suffixes prefixing the reverse of S p ; the algorithm choose the first among them, as they are all equivalent for its purposes  ... 
arXiv:1910.02851v1 fatcat:dibz5cfce5e6lcgbip5b67kolm

Compressed full-text indexes

Gonzalo Navarro, Veli Mäkinen
2007 ACM Computing Surveys  
The exciting possibility of an index that takes space close to that of the compressed text, replaces it, and in addition provides fast search over it, has triggered a wealth of activity and produced surprising  ...  Then we cover the most relevant self-indexes, focusing on how they exploit text compressibility to achieve compact structures that can efficiently solve various search problems.  ...  Acknowledgements We thank Paolo Ferragina, Kunihiko Sadakane, and the anonymous referees for their invaluable help to improve this survey. Their hard work is greatly appreciated.  ... 
doi:10.1145/1216370.1216372 fatcat:cvpuqe5kl5gibfa6xvqvhg3kaa

CSA++: Fast Pattern Search for Large Alphabets [article]

Simon Gog and Alistair Moffat and Matthias Petri
2016 arXiv   pre-print
Commencing with the practical compressed suffix array structure developed by Sadakane, we show that the Elias-Fano code-based approach to document indexing can be adapted to provide new tradeoff options  ...  In this paper we apply recent innovations from the field of inverted indexing and document retrieval to compressed pattern search, including for alphabets into the millions.  ...  In a compressed suffix array, or CSA, the space required is proportional to the compressed size of T.  ... 
arXiv:1605.05404v1 fatcat:5zchtj344baefmmfpoblbbolba

Text Searching: Theory and Practice [chapter]

Ricardo Baeza-Yates, Gonzalo Navarro
2004 Studies in Fuzziness and Soft Computing  
We present the state of the art of the main component of text retrieval systems: the search engine. We outline the main lines of research and issues involved.  ...  We survey the relevant techniques in use today for text searching and explore the gap between theoretical and practical algorithms. The main observation is that simpler ideas are better in practice.  ...  [32] is still one of the best to build the suffix array in secondary memory [20] . Given M main memory, they need O(n 2 log(M )/M ) worst-case disk transfer time.  ... 
doi:10.1007/978-3-540-39886-8_30 fatcat:wvt7jqjbn5gebjr3depc3uopxq

Sketching and Sublinear Data Structures in Genomics

Guillaume Marçais, Brad Solomon, Rob Patro, Carl Kingsford
2019 Annual Review of Biomedical Data Science  
We describe these techniques at a high level and give several representative applications of each.  ...  Specifically, we focus on four key ideas that take different approaches to achieve sublinear space usage and processing time: compressed full text indices, approximate membership query data structures,  ...  (BIO-1564917, CCF-1750472, and CNS-1763680), and by the US National Institutes of Health (R01HG007104, R01GM122935, and R01HG009937).  ... 
doi:10.1146/annurev-biodatasci-072018-021156 fatcat:zlqdv6ke4vdmvgaaqwvvd53iae

Word-based self-indexes for natural language text

Antonio Fariña, Nieves R. Brisaboa, Gonzalo Navarro, Francisco Claude, Ángeles S. Places, Eduardo Rodríguez
2012 ACM Transactions on Information Systems  
Within this space it supports not only decompression of arbitrary passages, but efficient word and phrase searches.  ...  The inverted index supports efficient full-text searches on natural language text collections. It requires some extra space over the compressed text that can be traded for search speed.  ...  Instead of emulating a classical suffix array search, FM-indexes use a concept called backward search.  ... 
doi:10.1145/2094072.2094073 fatcat:lj4bsjt6wzccdnwwox5qc3qvjm
« Previous Showing results 1 — 15 out of 383 results