A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Lossless Seeds for Searching Short Patterns with High Error Rates
[chapter]
2015
Lecture Notes in Computer Science
Experimental tests show that the method is specifically well-suited for short patterns with a large number of errors. ...
For that purpose, we propose a filtration algorithm that is based on a novel type of seeds, combining exact parts and parts with a fixed number of errors. ...
DNA alphabet) with a medium to high error-rate (7 %-15 %). ...
doi:10.1007/978-3-319-19315-1_32
fatcat:d3vvcajxyvf37i66vhbod4h4lm
Seed Design Framework for Mapping SOLiD Reads
[chapter]
2010
Lecture Notes in Computer Science
Our method can handle both lossy and lossless frameworks and is able to distinguish, at the level of seed design, between SNPs and reading errors. ...
The advent of high-throughput sequencing technologies constituted a major advance in genomic studies, offering new prospects in a wide range of applications. ...
Acknowledgments The authors would like to thank Valentina Boeva and Emmanuel Barillot from the Institut Marie Curie at Paris for helpful discussions and for providing the dataset of Saccharomyces cerevisiae ...
doi:10.1007/978-3-642-12683-3_25
fatcat:65fdkd7yjrgkrpcoujaeqjlw7y
Approximate search of short patterns with high error rates using the 01 ⁎ 0 lossless seeds
2016
Journal of Discrete Algorithms
Approximate search of short patterns with high error rates using the 010 lossless seeds. Journal of Discrete Algorithms, Elsevier, 2016, 37, pp. ...
However, searching a short pattern in a text with high error rates (10%-20%) under the Levenshtein distance is a task for which few efficient solutions exist. ...
Despite this large body of research on seeding techniques for the approximate string matching problem, we believe that there is still room for improvement for small patterns with high error rates (10- ...
doi:10.1016/j.jda.2016.03.002
fatcat:ntcajlrvujbwjaekcev2akbagu
Designing Efficient Spaced Seeds for SOLiD Read Mapping
2010
Advances in Bioinformatics
Our method can handle both lossy and lossless frameworks and is able to distinguish, at the level of seed design, between SNPs and reading errors. ...
The advent of high-throughput sequencing technologies constituted a major advance in genomic studies, offering new prospects in a wide range of applications.We propose a rigorous and flexible algorithmic ...
Acknowledgments The authors would like to thank Valentina Boeva and Emmanuel Barillot from the Institut Marie Curie at Paris for helpful discussions and for providing the dataset of Saccharomyces cerevisiae ...
doi:10.1155/2010/708501
pmid:20936175
pmcid:PMC2945724
fatcat:nahdomgtm5bfpf3pk2mrmifh2i
2015 London Stringology Days and London Algorithmic Workshop (LSD & LAW)
2016
Journal of Discrete Algorithms
Special thanks must also go to the authors for their patience and meticulous revisions during this process. ...
We would like to express our gratitude to all the anonymous reviewers for timely and thorough reviewing of the articles. ...
The paper Approximate search of short patterns with high error rates using the 01*0 lossless seeds by C. Vroland, M. Salson, S. Bini, and H. ...
doi:10.1016/j.jda.2016.06.002
fatcat:ro63rejdf5cw5kzl4rtixn7bry
2D-pattern matching image and video compression: theory, algorithms, and experiments
2002
IEEE Transactions on Image Processing
We demonstrate bit rates in the range of 0.25 -0.5 bpp for high quality images and data mtes in the range of 0.15 -0.5 Mbps for a baseline video compression scheme that does not use any prediction or interpolation ...
In this paper, we propose a lossy data compression framework based on an approximate two dimensional pattern matching (2D-PMC) extension of the LempC!l-Ziv lossless scheme. ...
ACKNOWLEDGEMENT It is our privilege to acknowledge valuable discussions with Y. Reznik (RealNetwork Inc.) and 1. Kontoyiannis (Purdue University). ...
doi:10.1109/83.988964
pmid:18244634
fatcat:tyrqieamfvcdngmju4gt2aqihi
Pattern mining of cloned codes in software systems
2014
Information Sciences
Their computational complexity is very high and dramatically increases with the software size, thus limiting their applications in practice. ...
In this paper, we propose a novel pattern mining framework for cloned codes in software systems. ...
Acknowledgment This work is supported in part by the Open Projects Program of National Laboratory of Pattern Recognition and the President Fund of GUCAS. ...
doi:10.1016/j.ins.2010.04.022
fatcat:vvlqh62v6jhvxb3fx572nge2ra
Novel Computational Techniques For Mapping And Classifying Next-Generation Sequencing Data
2016
Zenodo
In this thesis, we present novel computational techniques for read mapping and taxonomic classification. With more than a hundred of published mappers, read mapping might be considered fully solved. ...
deciphering cancer biology, decoding the evolution of living or extinct species, or understanding human migration patterns and human history in general. ...
To conclude the indexing strategies, let us remark that the traditional seeding techniques work well with the short-read technologies, but become insufficient for long reads with a high rate of sequencing ...
doi:10.5281/zenodo.1045317
fatcat:agbwpocisncwvihn2ij5q3ongq
Short Read Mapping: An Algorithmic Tour
2017
Proceedings of the IEEE
To solve this complex, high-dimensional puzzle, reads must be mapped back to a reference genome to determine their origin Due to sequencing errors and to genuine differences between the reference genome ...
The advent of ultra-high-throughput next-generation sequencing (NGS) technology in 2007 presented major new challenges for alignment. ...
a high error rate. ...
doi:10.1109/jproc.2015.2455551
pmid:28502990
pmcid:PMC5425171
fatcat:ic6d6z5ggbazdkaotcotdtbo3u
Improved hit criteria for DNA local alignment
2004
BMC Bioinformatics
It specifies a class of patterns assumed to witness a potential similarity, and this choice is decisive for the selectivity and sensitivity of the whole method. ...
We provide analytical data as well as experimental results, obtained with the YASS software, supporting both improvements. ...
Acknowledgments We are grateful to Mikhail Roytberg for enlightening discussions, and to Marie-Pierre Etienne, Roman Kolpakov, Gilles Schaeffer and Pierre Valois for their helpful comments at early stages ...
doi:10.1186/1471-2105-5-149
pmid:15485572
pmcid:PMC526756
fatcat:vkj773vw4jcmdixihpzwwjezxq
A Review on Sequence Alignment Algorithms for Short Reads Based on Next-Generation Sequencing
2020
IEEE Access
With recent advances in next-generation sequencing (NGS) technology, large volumes of data have been produced in the form of short reads. ...
This study is a review of the different short read alignment algorithms and NGS platforms that have been developed to date, in order to aid efficient selection of algorithms for reference sequences and ...
simple task of searching for a single character to database searches involving complex patterns [36] . ...
doi:10.1109/access.2020.3031159
fatcat:6so6h6f7qbbfjle7zp6wb6hi2i
Improved long read correction for de novo assembly using an FM-index
[article]
2016
bioRxiv
pre-print
Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. ...
To this end, we describe a novel application of a multi-string Burrows-Wheeler transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. ...
BWT, then searches for a high-weight path between seeds that most closely matches the intervening long read sequence. ...
doi:10.1101/067272
fatcat:n45m76rqbjdt3j57otwqe3mbxu
Lossless filter for multiple repetitions with Hamming distance
2008
Journal of Discrete Algorithms
However, previous filters were made for speeding up pattern matching, or for finding repetitions between two strings or occurring twice in the same string. ...
Nimbus uses gapped seeds that are indexed with a new data structure, called a bi-factor array, that is also presented in this paper. ...
Acknowledgement The authors would like to thank the anonymous referee for his careful reading and the very helpful comments that sensibly improved the paper. ...
doi:10.1016/j.jda.2007.03.003
fatcat:r5jfpzdkhndkpiuu2zwbluigbq
Evolution of biosequence search algorithms: a brief survey
[article]
2018
arXiv
pre-print
The paper surveys the evolution of main algorithmic techniques to compare and search biological sequences. ...
Acknowledgements Many thanks go to Karel Břinda for his helpful comments and suggestions.
Bibliography ...
This idea is particularly useful for mapping long reads produced by "third-generation sequencing technologies", such as Pacific Biosciences(TM) or Oxford Nanopore(TM), presenting high error rates dominated ...
arXiv:1808.01038v4
fatcat:uiyjrwvgprgu3nfcu6i47o4wpe
Dynamic Partitioning of Search Patternsfor Approximate Pattern Matching using Search Schemes
2021
iScience
Dynamic partitioning of search patterns reduces search space and runtime Memory interleaving of bit vector representations of the BWT reduces runtime Avoiding redundancy inherent to the edit distance metric ...
reduces the search space Our software tool Columba outperforms the state of the art by a factor of 3.5 ...
Additionally, they show that related work on lossless approximate pattern matching by Vroland et al. on 01*0 seeds (Vroland et al., 2016) can also be expressed as search schemes. ...
doi:10.1016/j.isci.2021.102687
fatcat:pgq5qdc75bgkljdj6iikapakfq
« Previous
Showing results 1 — 15 out of 769 results