Filters








2,865 Hits in 4.7 sec

Compact Universal k-mer Hitting Sets [chapter]

Yaron Orenstein, David Pellow, Guillaume Marçais, Ron Shamir, Carl Kingsford
2016 Lecture Notes in Computer Science  
We prove that the problem of hitting a given set of L-long sequences is NP-hard and give a heuristic solution that finds a compact universal k-mer set that hits any set of L-long sequences.  ...  The algorithm, called DOCKS (design of compact k-mer sets), works in two phases: (i) finding a minimum-size k-mer set that hits every infinite sequence; (ii) greedily adding k-mers such that together they  ...  Safra Center for Bioinformatics at Tel-Aviv University.  ... 
doi:10.1007/978-3-319-43681-4_21 fatcat:h5tkzy7wtbakxohn3ndxgw5ugq

Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing

Yaron Orenstein, David Pellow, Guillaume Marçais, Ron Shamir, Carl Kingsford, Benjamin J. Raphael
2017 PLoS Computational Biology  
Compact universal k-mer hitting sets PLOS Computational Biology | https://doi.  ...  In this work, we create universal k-mer hitting sets that hit any L-long sequence.  ...  Fig 2 compares the sizes of the sets generated by DOCKS, DOCKSany, and Compact universal k-mer hitting sets DOCKSanyX (for X = 625).  ... 
doi:10.1371/journal.pcbi.1005777 pmid:28968408 pmcid:PMC5645146 fatcat:6ocuwfntarhc7fhnv7bjkx657e

Improving the performance of minimizers and winnowing schemes [article]

Guillaume Marcais, David Pellow, Daniel Bork, Yaron Orenstein, Ron Shamir, Carl Kingsford
2017 bioRxiv   pre-print
By using small universal hitting sets (a recently defined concept), we show how to significantly improve the performance of minimizers and avoid some of its worse behaviors.  ...  Based on these results, we encourage bioinformatics software developers to use an ordering based on a universal hitting set or, if not possible, a randomized ordering, rather than the lexicographic order  ...  Safra Center for Bioinformatics at Tel-Aviv University. R.S. was supported in part by the Israel Science Foundation as part of the ISF-NSFC joint program 2015-2018.  ... 
doi:10.1101/104075 fatcat:tptsyrhl5zdtfeqly7ixx6o6na

Compact and evenly distributed k-mer binning for genomic sequences

Johan Nyström-Persson, Gabriel Keeble-Gagnère, Niamat Zawad, Peter Robinson
2021 Bioinformatics  
Using this tool, we then introduce the universal frequency ordering, a new combination of frequencysampled minimizers and universal k-mer hitting sets, which yields both evenly distributed binning and  ...  Minimizers—ordered m-mers where m < k — are often used to group k-mers into bins as a first step in such processing.  ...  Acknowledgements J.N.P. was grateful to Shandar Ahmad for discussions and for the invitation to visit Jawaharlal Nehru University, New Delhi, in 2018 through the GIAN programme.  ... 
doi:10.1093/bioinformatics/btab156 pmid:33693556 pmcid:PMC8428581 fatcat:ikvzyktq5zdmxgjtfosvn577je

Improving the performance of minimizers and winnowing schemes

Guillaume Marçais, David Pellow, Daniel Bork, Yaron Orenstein, Ron Shamir, Carl Kingsford
2017 Bioinformatics  
By using small universal hitting sets (a recently defined concept), we show how to significantly improve the performance of minimizers and avoid some of its worse behaviors.  ...  Based on these results, we encourage bioinformatics software developers to use an ordering based on a universal hitting set or, if not possible, a randomized ordering, rather than the lexicographic order  ...  Safra Center for Bioinformatics at Tel-Aviv University. R.S. was supported in part by the Israel Science Foundation as part of the ISF-NSFC joint program 2015-2018.  ... 
doi:10.1093/bioinformatics/btx235 pmid:28881970 pmcid:PMC5870760 fatcat:wgniixvld5ce5czkagr3ohzqgm

Compact and evenly distributed k-mer binning for genomic sequences [article]

Johan T Nyström-Persson, Gabriel Keeble-Gagnère, Niamat Zawad
2020 bioRxiv   pre-print
Using this tool, we then introduce a new combination of frequency counted minimizers and universal k-mer hitting sets, the universal frequency ordering, which yields both evenly distributed binning and  ...  Minimizers - ordered m-mers where m < k - are often used to group k-mers into bins as a first step in such processing.  ...  Acknowledgments J.N.P. is grateful to Shandar Ahmad for discussions and for the invitation to visit Jawaharlal Nehru University, New Delhi, in 2018 through the GIAN programme.  ... 
doi:10.1101/2020.10.12.335364 fatcat:7sldddlwgbhlbfkig467nf7hpu

Sequence-specific minimizers via polar sets [article]

Hongyu Zheng, Carl Kingsford, Guillaume Marçais
2021 bioRxiv   pre-print
Well-established methods to construct efficient minimizers focus on sampling fewer k-mers on a random sequence and use universal hitting sets (sets of k-mers that appear frequently enough) to upper bound  ...  We propose the concept of polar sets, complementary to the existing idea of universal hitting sets.  ...  We do not include existing algorithms for constructing compact universal hitting sets because these methods do not scale to values of k > 14.  ... 
doi:10.1101/2021.02.01.429246 fatcat:ynvgorrkwvc3npetovrcv65b34

Improving the efficiency of de Bruijn graph construction using compact universal hitting sets [article]

Yael Ben-Ari, Ron Ben Shamir, Yaron Orenstein, Lianrong Pu
2020 bioRxiv   pre-print
Recently, a new k-mer order based on a universal hitting set (UHS) was suggested.  ...  The prevailing method to index DNA sequences in those data structures and algorithms is by k-mers (k-long substrings) known as minimizers.  ...  Universal hitting sets A set of k-mers M hits sequence s if there exists a k-mer in M that is a substring in s.  ... 
doi:10.1101/2020.11.08.373050 fatcat:4jcc55ea3fclxmyflhz62bwdeu

Sparse and Skew Hashing of K-Mers [article]

Giulio Ermanno Pibiri
2022 bioRxiv   pre-print
Motivation: A dictionary of k-mers is a data structure that stores a set of n distinct k-mers and supports membership queries.  ...  Results: To tackle this problem, we describe a compressed and associative dictionary for k-mers, that is: a data structure where strings are represented in compact form and each of them is associated to  ...  This holds true for both high-hit workloads (> 70% hits, i.e., k-mers present in the dictionary) and low-hit workloads (< 1% hits).  ... 
doi:10.1101/2022.01.15.476199 fatcat:izagu2egq5bhvbm4r6l4unqt7y

Detecting High Scoring Local Alignments in Pangenome Graphs [article]

Tizian Schulz, Roland Wittler, Sven Rahmann, Faraz Hach, Jens Stoye
2020 bioRxiv   pre-print
Pangenomes may be represented as graphical structures, e.g. compacted colored de Bruijn graphs, which offer a low memory usage and facilitate reference-free sequence comparisons.  ...  Results: We present a new heuristic method to find maximum scoring local alignments of a DNA query sequence to a pangenome represented as a compacted colored de Bruijn graph.  ...  Compacted colored de Bruijn graphs Let k ≥ 2.  ... 
doi:10.1101/2020.09.03.280958 fatcat:daepe6b4dzeovcrqgfnn3yfjie

Simrank: Rapid and sensitive general-purpose k-mer search tool

Todd Z DeSantis, Keith Keller, Ulas Karaoz, Alexander V Alekseyenko, Navjeet NS Singh, Eoin L Brodie, Zhiheng Pei, Gary L Andersen, Niels Larsen
2011 BMC Ecology  
Intra-and inter-project data similarity searches are enabled by rapid k-mer matching strategies.  ...  However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available.  ...  NS and UK conducted testing with DNA data sets. UK and TD performed the ROC analysis. AA packaged the CPAN distribution. All authors reviewed and approved the manuscript.  ... 
doi:10.1186/1472-6785-11-11 pmid:21524302 pmcid:PMC3097142 fatcat:otgn3jecn5d7pkovse7xttnpma

An Information Theoretic Approach to Macromolecular Modeling: I. Sequence Alignments

Tiba Aynechi, Irwin D. Kuntz
2005 Biophysical Journal  
Average hits for the 15-mer and 20-mer exhaustive sets are shown on the axis ends.  ...  The number of hits to match a given template of length T, where K ¼ N -T, against an exhaustive set of sequences becomes W ¼ + K 1 ½CðN; KÞ 3 ðA ÿ 1Þ K : (8) This equation (discovered empirically from  ... 
doi:10.1529/biophysj.104.054072 pmid:16254389 pmcid:PMC1366797 fatcat:oioiyxtcafd2lnohnzklmgkv5u

Improving the sensitivity of long read overlap detection using grouped short k-mer matches

Nan Du, Jiao Chen, Yanni Sun
2019 BMC Genomics  
While using k-mer hits for detecting reads' overlaps has been adopted by several existing programs, our method uses a group of short k-mer hits satisfying statistically derived distance constraints to  ...  Grouped k-mer hit was originally designed for homology search. We are the first to apply group hit for long read overlap detection.  ...  Acknowledgements Not applicable Funding Publication of this article was sponsored by NSF Grant IOS-1740874 and City University of Hong Kong.  ... 
doi:10.1186/s12864-019-5475-x fatcat:gkzhwkkqhzgodch6cdtundmt2e

Improved design and analysis of practical minimizers

Hongyu Zheng, Carl Kingsford, Guillaume Marçais
2020 biorxiv/medrxiv  
Minimizers are methods to sample k-mers from a sequence, with the guarantee that similar set of k-mers will be chosen on similar sequences.  ...  It is parameterized by the k-mer length k, a window length w and an order on the k-mers.  ...  Approximately Optimal Universal Hitting Sets One interesting implication of Theorem 3 is on construction and approximation of compact universal hitting sets.  ... 
doi:10.1101/2020.02.07.939025 fatcat:rmutn5rjpncwzpw2dh3efil2sq

nsearch: An open source C++ library for processing and similarity searching of next-generation sequencing data [article]

Steven Schmid, Timothy Julian, Manu Tamminen
2018 bioRxiv   pre-print
Within nsearch, each k-mer is transformed into a compact binary format (32-bit unsigned integer).  ...  K-mers which contain at least one ambiguous residue are mapped into a special k-mer called AmbiguousKmer.  ... 
doi:10.1101/399782 fatcat:ah6camc2h5bojlajrlzos5wr5y
« Previous Showing results 1 — 15 out of 2,865 results