A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
Advantages of Backward Searching — Efficient Secondary Memory and Distributed Implementation of Compressed Suffix Arrays
[chapter]
2004
Lecture Notes in Computer Science
Furthermore, the regular access pattern of backward searching permits an efficient secondary memory implementation, so that the search can be done with O(m log B n) disk accesses, being B the disk block ...
One of the most relevant succinct suffix array proposals in the literature is the Compressed Suffix Array (CSA) of Sadakane [ISAAC 2000]. ...
Conclusions We have proposed a new implementation of the backward search algorithm for the Compressed Suffix Array (CSA). ...
doi:10.1007/978-3-540-30551-4_59
fatcat:exa2icv2prdljl7pphs54pzgge
Prospects and limitations of full-text index structures in genome analysis
2012
Nucleic Acids Research
Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. ...
Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. ...
In addition, the authors would also like to thank the editor and the anonymous reviewers for their valuable comments and suggestions to improve the quality of the manuscript. ...
doi:10.1093/nar/gks408
pmid:22584621
pmcid:PMC3424560
fatcat:5sfziui7ujhfzcqhcukbi4utjq
Inexact Sequence Mapping Study Cases: Hybrid GPU Computing and Memory Demanding Indexes
2014
International Work-Conference on Bioinformatics and Biomedical Engineering
In this paper we discuss the implementation of backward search methods for inexact mapping in these two study cases. ...
On the other hand, out-of-core implementations allow to directly access data from secondary memory, which may be useful when mapping against big indexes in systems with low memory configurations. ...
Nowadays, several mapping techniques are based in backward search methods over Suffix arrays [16] (SA). ...
dblp:conf/iwbbio/TorresTMB14
fatcat:emlwq5aysreobn52mqfd7lo3dq
Large-Scale Pattern Search Using Reduced-Space On-Disk Suffix Arrays
[article]
2013
arXiv
pre-print
The suffix array is an efficient data structure for in-memory pattern search. ...
Suffix arrays can also be used for external-memory pattern search, via two-level structures that use an internal index to identify the correct block of suffix pointers. ...
Abstract-The suffix array is an efficient data structure for in-memory pattern search. ...
arXiv:1303.6481v1
fatcat:soh5dytslfezjffzutf6lwfxbi
Large-Scale Pattern Search Using Reduced-Space On-Disk Suffix Arrays
2014
IEEE Transactions on Knowledge and Data Engineering
The suffix array is an efficient data structure for in-memory pattern search. ...
Suffix arrays can also be used for external-memory pattern search, via two-level structures that use an internal index to identify the correct block of suffix pointers. ...
For static data the SB-TREE can be implemented as a uniform partitioning of a suffix array, with an in-memory suffix tree index implemented as a blind tree or bit-blind tree. ...
doi:10.1109/tkde.2013.129
fatcat:237566jg3zfatfep2hjilvgj4m
Compression, Indexing, and Retrieval for Massive String Data
[chapter]
2010
Lecture Notes in Computer Science
Several challenges remain, and we focus in this presentation on two in particular: building I/O-efficient search structures when the input data are so massive that external memory must be used, and incorporating ...
The field of compressed data structures seeks to achieve fast search time, but using a compressed representation, ideally requiring less space than that occupied by the original input data. ...
However, we are increasingly having to deal with massive data sets that do not easily fit into internal memory and thus must be stored on secondary storage, such as disk drives, or in a distributed fashion ...
doi:10.1007/978-3-642-13509-5_24
fatcat:yai4yylbdfhqxm7n65m7vw4lli
Accelerating Maximal-Exact-Match Seeding with Enumerated Radix Trees
[article]
2020
bioRxiv
pre-print
Furthermore, we prototype an FPGA implementation of ERT on Amazon EC2 F1 cloud and observe 1.6× higher seeding throughput over a 48-thread optimized CPU-ERT implementation.Availability and implementationhttps ...
This is because both BWA-MEM and BWA-MEM2 use a compressed index structure called the FMD-Index, which results in high memory bandwidth requirements for seeding, primarily due to its character-by-character ...
The locations of these SMEMs in the reference genome (hits) are then determined by a suffix array lookup (can take multiple occurrence table lookups in BWA-MEM, since the suffix array is sampled) and passed ...
doi:10.1101/2020.03.23.003897
fatcat:ih3uy7tabjc7zjr774uqez2nva
SeqAn An efficient, generic C++ library for sequence analysis
2008
BMC Bioinformatics
The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use. ...
Results: To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. ...
Acknowledgements We would like to acknowledge all students of the BSc and MSc program in bioinformatics at the FU Berlin who have contributed to SeqAn so far. ...
doi:10.1186/1471-2105-9-11
pmid:18184432
pmcid:PMC2246154
fatcat:mkmgd3gkzjavvmoh3tgrjmny24
Distributed text search using suffix arrays
2014
Parallel Computing
We introduce techniques for deploying suffix arrays on clusters of distributed-memory processors and then study the processing of multiple queries on the distributed data structure. ...
Even though the cost of individual search operations in sequential (non-distributed) suffix arrays is low in practice, the problem of processing multiple queries on distributed-memory systems, so that ...
Acknowledgments The experimental part of this work has been supported by a HPC-EUROPA2 project (code 228398) with the support of the European Commission -Capacities Area -Research Infrastructures. ...
doi:10.1016/j.parco.2014.06.007
fatcat:tnsgzdptqjc63akhihkfw5j2pe
ER-index: a referential index for encrypted genomic databases
[article]
2019
arXiv
pre-print
We designed and implemented ER-index, a new full-text index in minute space which was optimized for compressing and encrypting collections of genomic sequences, and for performing on them fast pattern-search ...
Our new index complements the E2FM-index, which was introduced to compress and encrypt collections of nucleotide sequences without relying on a reference sequence. ...
This backward search gives as result the R rev suffix array range containing the suffixes prefixing the reverse of S p ; the algorithm choose the first among them, as they are all equivalent for its purposes ...
arXiv:1910.02851v1
fatcat:dibz5cfce5e6lcgbip5b67kolm
Compressed full-text indexes
2007
ACM Computing Surveys
The exciting possibility of an index that takes space close to that of the compressed text, replaces it, and in addition provides fast search over it, has triggered a wealth of activity and produced surprising ...
Then we cover the most relevant self-indexes, focusing on how they exploit text compressibility to achieve compact structures that can efficiently solve various search problems. ...
Acknowledgements We thank Paolo Ferragina, Kunihiko Sadakane, and the anonymous referees for their invaluable help to improve this survey. Their hard work is greatly appreciated. ...
doi:10.1145/1216370.1216372
fatcat:cvpuqe5kl5gibfa6xvqvhg3kaa
CSA++: Fast Pattern Search for Large Alphabets
[article]
2016
arXiv
pre-print
Commencing with the practical compressed suffix array structure developed by Sadakane, we show that the Elias-Fano code-based approach to document indexing can be adapted to provide new tradeoff options ...
In this paper we apply recent innovations from the field of inverted indexing and document retrieval to compressed pattern search, including for alphabets into the millions. ...
In a compressed suffix array, or CSA, the space required is proportional to the compressed size of T. ...
arXiv:1605.05404v1
fatcat:5zchtj344baefmmfpoblbbolba
Text Searching: Theory and Practice
[chapter]
2004
Studies in Fuzziness and Soft Computing
We present the state of the art of the main component of text retrieval systems: the search engine. We outline the main lines of research and issues involved. ...
We survey the relevant techniques in use today for text searching and explore the gap between theoretical and practical algorithms. The main observation is that simpler ideas are better in practice. ...
[32] is still one of the best to build the suffix array in secondary memory [20] . Given M main memory, they need O(n 2 log(M )/M ) worst-case disk transfer time. ...
doi:10.1007/978-3-540-39886-8_30
fatcat:wvt7jqjbn5gebjr3depc3uopxq
Sketching and Sublinear Data Structures in Genomics
2019
Annual Review of Biomedical Data Science
We describe these techniques at a high level and give several representative applications of each. ...
Specifically, we focus on four key ideas that take different approaches to achieve sublinear space usage and processing time: compressed full text indices, approximate membership query data structures, ...
(BIO-1564917, CCF-1750472, and CNS-1763680), and by the US National Institutes of Health (R01HG007104, R01GM122935, and R01HG009937). ...
doi:10.1146/annurev-biodatasci-072018-021156
fatcat:zlqdv6ke4vdmvgaaqwvvd53iae
Word-based self-indexes for natural language text
2012
ACM Transactions on Information Systems
Within this space it supports not only decompression of arbitrary passages, but efficient word and phrase searches. ...
The inverted index supports efficient full-text searches on natural language text collections. It requires some extra space over the compressed text that can be traded for search speed. ...
Instead of emulating a classical suffix array search, FM-indexes use a concept called backward search. ...
doi:10.1145/2094072.2094073
fatcat:lj4bsjt6wzccdnwwox5qc3qvjm
« Previous
Showing results 1 — 15 out of 383 results