Filters








769 Hits in 4.2 sec

Lossless Seeds for Searching Short Patterns with High Error Rates [chapter]

Christophe Vroland, Mikaël Salson, Hélène Touzet
2015 Lecture Notes in Computer Science  
Experimental tests show that the method is specifically well-suited for short patterns with a large number of errors.  ...  For that purpose, we propose a filtration algorithm that is based on a novel type of seeds, combining exact parts and parts with a fixed number of errors.  ...  DNA alphabet) with a medium to high error-rate (7 %-15 %).  ... 
doi:10.1007/978-3-319-19315-1_32 fatcat:d3vvcajxyvf37i66vhbod4h4lm

Seed Design Framework for Mapping SOLiD Reads [chapter]

Laurent Noé, Marta Gîrdea, Gregory Kucherov
2010 Lecture Notes in Computer Science  
Our method can handle both lossy and lossless frameworks and is able to distinguish, at the level of seed design, between SNPs and reading errors.  ...  The advent of high-throughput sequencing technologies constituted a major advance in genomic studies, offering new prospects in a wide range of applications.  ...  Acknowledgments The authors would like to thank Valentina Boeva and Emmanuel Barillot from the Institut Marie Curie at Paris for helpful discussions and for providing the dataset of Saccharomyces cerevisiae  ... 
doi:10.1007/978-3-642-12683-3_25 fatcat:65fdkd7yjrgkrpcoujaeqjlw7y

Approximate search of short patterns with high error rates using the 01 ⁎ 0 lossless seeds

Christophe Vroland, Mikaël Salson, Sébastien Bini, Hélène Touzet
2016 Journal of Discrete Algorithms  
Approximate search of short patterns with high error rates using the 010 lossless seeds. Journal of Discrete Algorithms, Elsevier, 2016, 37, pp.  ...  However, searching a short pattern in a text with high error rates (10%-20%) under the Levenshtein distance is a task for which few efficient solutions exist.  ...  Despite this large body of research on seeding techniques for the approximate string matching problem, we believe that there is still room for improvement for small patterns with high error rates (10-  ... 
doi:10.1016/j.jda.2016.03.002 fatcat:ntcajlrvujbwjaekcev2akbagu

Designing Efficient Spaced Seeds for SOLiD Read Mapping

Laurent Noé, Marta Gîrdea, Gregory Kucherov
2010 Advances in Bioinformatics  
Our method can handle both lossy and lossless frameworks and is able to distinguish, at the level of seed design, between SNPs and reading errors.  ...  The advent of high-throughput sequencing technologies constituted a major advance in genomic studies, offering new prospects in a wide range of applications.We propose a rigorous and flexible algorithmic  ...  Acknowledgments The authors would like to thank Valentina Boeva and Emmanuel Barillot from the Institut Marie Curie at Paris for helpful discussions and for providing the dataset of Saccharomyces cerevisiae  ... 
doi:10.1155/2010/708501 pmid:20936175 pmcid:PMC2945724 fatcat:nahdomgtm5bfpf3pk2mrmifh2i

2015 London Stringology Days and London Algorithmic Workshop (LSD & LAW)

Jakub Radoszewski, Tomasz Radzik
2016 Journal of Discrete Algorithms  
Special thanks must also go to the authors for their patience and meticulous revisions during this process.  ...  We would like to express our gratitude to all the anonymous reviewers for timely and thorough reviewing of the articles.  ...  The paper Approximate search of short patterns with high error rates using the 01*0 lossless seeds by C. Vroland, M. Salson, S. Bini, and H.  ... 
doi:10.1016/j.jda.2016.06.002 fatcat:ro63rejdf5cw5kzl4rtixn7bry

2D-pattern matching image and video compression: theory, algorithms, and experiments

M. Alzina, W. Szpankowski, A. Grama
2002 IEEE Transactions on Image Processing  
We demonstrate bit rates in the range of 0.25 -0.5 bpp for high quality images and data mtes in the range of 0.15 -0.5 Mbps for a baseline video compression scheme that does not use any prediction or interpolation  ...  In this paper, we propose a lossy data compression framework based on an approximate two dimensional pattern matching (2D-PMC) extension of the LempC!l-Ziv lossless scheme.  ...  ACKNOWLEDGEMENT It is our privilege to acknowledge valuable discussions with Y. Reznik (RealNetwork Inc.) and 1. Kontoyiannis (Purdue University).  ... 
doi:10.1109/83.988964 pmid:18244634 fatcat:tyrqieamfvcdngmju4gt2aqihi

Pattern mining of cloned codes in software systems

Wei Qu, Yuanyuan Jia, Michael Jiang
2014 Information Sciences  
Their computational complexity is very high and dramatically increases with the software size, thus limiting their applications in practice.  ...  In this paper, we propose a novel pattern mining framework for cloned codes in software systems.  ...  Acknowledgment This work is supported in part by the Open Projects Program of National Laboratory of Pattern Recognition and the President Fund of GUCAS.  ... 
doi:10.1016/j.ins.2010.04.022 fatcat:vvlqh62v6jhvxb3fx572nge2ra

Novel Computational Techniques For Mapping And Classifying Next-Generation Sequencing Data

Karel Břinda, Gregory Kucherov, Valentina Boeva
2016 Zenodo  
In this thesis, we present novel computational techniques for read mapping and taxonomic classification. With more than a hundred of published mappers, read mapping might be considered fully solved.  ...  deciphering cancer biology, decoding the evolution of living or extinct species, or understanding human migration patterns and human history in general.  ...  To conclude the indexing strategies, let us remark that the traditional seeding techniques work well with the short-read technologies, but become insufficient for long reads with a high rate of sequencing  ... 
doi:10.5281/zenodo.1045317 fatcat:agbwpocisncwvihn2ij5q3ongq

Short Read Mapping: An Algorithmic Tour

Stefan Canzar, Steven L. Salzberg
2017 Proceedings of the IEEE  
To solve this complex, high-dimensional puzzle, reads must be mapped back to a reference genome to determine their origin Due to sequencing errors and to genuine differences between the reference genome  ...  The advent of ultra-high-throughput next-generation sequencing (NGS) technology in 2007 presented major new challenges for alignment.  ...  a high error rate.  ... 
doi:10.1109/jproc.2015.2455551 pmid:28502990 pmcid:PMC5425171 fatcat:ic6d6z5ggbazdkaotcotdtbo3u

Improved hit criteria for DNA local alignment

Laurent Noé, Gregory Kucherov
2004 BMC Bioinformatics  
It specifies a class of patterns assumed to witness a potential similarity, and this choice is decisive for the selectivity and sensitivity of the whole method.  ...  We provide analytical data as well as experimental results, obtained with the YASS software, supporting both improvements.  ...  Acknowledgments We are grateful to Mikhail Roytberg for enlightening discussions, and to Marie-Pierre Etienne, Roman Kolpakov, Gilles Schaeffer and Pierre Valois for their helpful comments at early stages  ... 
doi:10.1186/1471-2105-5-149 pmid:15485572 pmcid:PMC526756 fatcat:vkj773vw4jcmdixihpzwwjezxq

A Review on Sequence Alignment Algorithms for Short Reads Based on Next-Generation Sequencing

Jeongkyu Kim, Mingeun Ji, Gangman Yi
2020 IEEE Access  
With recent advances in next-generation sequencing (NGS) technology, large volumes of data have been produced in the form of short reads.  ...  This study is a review of the different short read alignment algorithms and NGS platforms that have been developed to date, in order to aid efficient selection of algorithms for reference sequences and  ...  simple task of searching for a single character to database searches involving complex patterns [36] .  ... 
doi:10.1109/access.2020.3031159 fatcat:6so6h6f7qbbfjle7zp6wb6hi2i

Improved long read correction for de novo assembly using an FM-index [article]

James M Holt, Jeremy R Wang, Corbin D Jones, Leonard McMillan
2016 bioRxiv   pre-print
Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies.  ...  To this end, we describe a novel application of a multi-string Burrows-Wheeler transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads.  ...  BWT, then searches for a high-weight path between seeds that most closely matches the intervening long read sequence.  ... 
doi:10.1101/067272 fatcat:n45m76rqbjdt3j57otwqe3mbxu

Lossless filter for multiple repetitions with Hamming distance

Pierre Peterlongo, Nadia Pisanti, Frédéric Boyer, Alair Pereira do Lago, Marie-France Sagot
2008 Journal of Discrete Algorithms  
However, previous filters were made for speeding up pattern matching, or for finding repetitions between two strings or occurring twice in the same string.  ...  Nimbus uses gapped seeds that are indexed with a new data structure, called a bi-factor array, that is also presented in this paper.  ...  Acknowledgement The authors would like to thank the anonymous referee for his careful reading and the very helpful comments that sensibly improved the paper.  ... 
doi:10.1016/j.jda.2007.03.003 fatcat:r5jfpzdkhndkpiuu2zwbluigbq

Evolution of biosequence search algorithms: a brief survey [article]

Gregory Kucherov
2018 arXiv   pre-print
The paper surveys the evolution of main algorithmic techniques to compare and search biological sequences.  ...  Acknowledgements Many thanks go to Karel Břinda for his helpful comments and suggestions. Bibliography  ...  This idea is particularly useful for mapping long reads produced by "third-generation sequencing technologies", such as Pacific Biosciences(TM) or Oxford Nanopore(TM), presenting high error rates dominated  ... 
arXiv:1808.01038v4 fatcat:uiyjrwvgprgu3nfcu6i47o4wpe

Dynamic Partitioning of Search Patternsfor Approximate Pattern Matching using Search Schemes

Luca Renders, Kathleen Marchal, Jan Fostier
2021 iScience  
Dynamic partitioning of search patterns reduces search space and runtime Memory interleaving of bit vector representations of the BWT reduces runtime Avoiding redundancy inherent to the edit distance metric  ...  reduces the search space Our software tool Columba outperforms the state of the art by a factor of 3.5  ...  Additionally, they show that related work on lossless approximate pattern matching by Vroland et al. on 01*0 seeds (Vroland et al., 2016) can also be expressed as search schemes.  ... 
doi:10.1016/j.isci.2021.102687 fatcat:pgq5qdc75bgkljdj6iikapakfq
« Previous Showing results 1 — 15 out of 769 results