A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Estimating evolutionary distances between genomic sequences from spaced-word matches
2015
Algorithms for Molecular Biology
Estimating evolutionary distances between genomic sequences from spaced-word matches. ...
Herein, we propose a simple estimator d N of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. ...
We thank Matteo Comin, Ruth Kantorovitz, Saurabh Sinha and an unknown WABI reviewer for pointing out an an error regarding the covariance of spaced-word matches in the previous version of this manuscript ...
doi:10.1186/s13015-015-0032-x
pmid:25685176
pmcid:PMC4327811
fatcat:4yadekvijrhk3fwfdiskkwgeky
Read-SpaM: assembly-free and alignment-free comparison of bacterial genomes with low sequencing coverage
[article]
2019
bioRxiv
pre-print
Test runs on simulated reads from bacterial genomes show that our approach can estimate phylogenetic distances with high accuracy, even for large evolutionary distances and for very low sequencing coverage ...
In many fields of biomedical research, it is important to estimate phylogenetic distances between taxa based on low-coverage sequencing reads. ...
In both cases, all spaced-word matches between the reads and the genomes or between the reads from the first taxon and the reads from the second taxon are identified and used to estimate the Jukes-Cantor ...
doi:10.1101/550632
fatcat:zjiaeur3uzchbdvvmiv7d5u5by
Read-SpaM: assembly-free and alignment-free comparison of bacterial genomes with low sequencing coverage
2019
BMC Bioinformatics
Test runs on simulated reads from semi-artificial and real-world bacterial genomes show that our approach can estimate phylogenetic distances with high accuracy, even for large evolutionary distances and ...
In many fields of biomedical research, it is important to estimate phylogenetic distances between taxa based on low-coverage sequencing reads. ...
If the sequencing coverage is too low and/or the evolutionary distance between the compared sequences is too large, it happens that no spaced-word or k-mer matches are found, and the distance between the ...
doi:10.1186/s12859-019-3205-7
pmid:31842735
pmcid:PMC6916211
fatcat:s2ea23nqgbhbdazsdwb7666xli
Fast and accurate phylogeny reconstruction using filtered spaced-word matches
2017
Bioinformatics
Results: We propose Filtered Spaced Word Matches (FSWM), a fast alignment-free approach to estimate phylogenetic distances between large genomic sequences. ...
To reduce the noise from spurious random matches, we use a filtering procedure where we discard all spaced-word matches for which the overall similarity between the aligned segments is below a threshold ...
After a set of spaced-word matches has been selected for a pair of genomic sequences as described, we estimate the evolutionary distance between the sequences by considering all don't-care positions of ...
doi:10.1093/bioinformatics/btw776
pmid:28073754
pmcid:PMC5409309
fatcat:zyf4hvk43jeirf2a2bv5uxzj74
Sequence Comparison without Alignment: The SpaM approaches
[article]
2019
bioRxiv
pre-print
Unlike most previous alignment-free approaches, our approaches are able to accurately estimate phylogenetic distances between DNA or protein sequences based on stochastic models of molecular evolution. ...
Our approaches are based on spaced word matches ('SpaM'), i.e. on inexact word matches, that are allowed to contain mismatches at certain pre-defined positions. ...
To estimate the evolutionary distance between two protein sequences, we consider the pairs of amino acids aligned to each other at the don't-care positions of the selected spaced-word matches, and we are ...
doi:10.1101/2019.12.16.878314
fatcat:5fufoujm7nejjnlylukc3gcyxe
Prot-SpaM: Fast alignment-free phylogeny reconstruction based on whole-proteome sequences
[article]
2018
bioRxiv
pre-print
One of these approaches is Filtered Spaced Word Matches. ...
Herein, we extend this approach to estimate evolutionary distances between complete or incomplete proteomes; our implementation of this approach is called Prot-SpaM. ...
is a spaced-word match involving one word from each of the two sequences. ...
doi:10.1101/306142
fatcat:5jpp7ae53zhd3dzc3ojzge6eeq
Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences
2018
GigaScience
One of these approaches is Filtered Spaced Word Matches. ...
Here, we extend this approach to estimate evolutionary distances between complete or incomplete proteomes; our implementation of this approach is called Prot-SpaM. ...
Such spaced-word matches can be rapidly identified and, after discarding random background matches, the remaining "homologous" spaced-word matches can be used to estimate the phylogenetic distance between ...
doi:10.1093/gigascience/giy148
pmid:30535314
pmcid:PMC6436989
fatcat:4t2w5qwat5cjrbrbc7ph5eota4
The number of k-mer matches between two DNA sequences as a function of k and applications to estimate phylogenetic distances
2020
PLoS ONE
We show that the Jukes-Cantor distance between two genome sequences-i.e. the number of substitutions per site that occurred since they evolved from their last common ancestor-can be estimated from the ...
We study the number Nk of length-k word matches between pairs of evolutionarily related DNA sequences, as a function of k. ...
on an earlier version of this manuscript, Andrzej Zielezinski and Wojchiech Karlowski for making the AFproject server available, Christoph Bleidorn and Micha Gerth for discussions about the Wolbachia genomes ...
doi:10.1371/journal.pone.0228070
pmid:32040534
pmcid:PMC7010260
fatcat:kbdrrho6cnfrvbbnv4rpyulaii
Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches
2014
Nucleic Acids Research
Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. By contrast, our new algorithms are based on inexact word matches. ...
Various distance measures can then be defined on sequences based on their different spaced-word composition. ...
The user can chose between the Euclidean distance and the Jensen-Shannon distance to estimate distances between the input sequences based on their spaced-word frequencies.
Figure 2. ...
doi:10.1093/nar/gku398
pmid:24829447
pmcid:PMC4086093
fatcat:p56sepl3tfghfjod7nri227bqy
Multi-SpaM: a Maximum-Likelihood approach to Phylogeny reconstruction based on Multiple Spaced-Word Matches
[article]
2018
arXiv
pre-print
Most of these methods calculate pairwise distances for a set of input sequences, for example from word frequencies, from so-called spaced-word matches or from the average length of common substrings. ...
Results: In this paper, we propose the first word-based approach to tree reconstruction that is based on multiple sequence comparison and Maximum Likelihood. ...
Distance methods, by contrast, infer phylogenies by estimating evolutionary distances for all pairs of input taxa [16] . ...
arXiv:1803.09222v2
fatcat:ucsh6iizjndojewwnuvkapzj5e
Multi-SpaM: A Maximum-Likelihood Approach to Phylogeny Reconstruction Using Multiple Spaced-Word Matches and Quartet Trees
[chapter]
2018
Lecture Notes in Computer Science
Most alignment-free methods calculate 'pairwise' distances between nucleicacid or protein sequences; these distance values can then be used as input for tree-reconstruction programs such as neighbor-joining ...
In this paper, we propose the first word-based phylogeny approach that is based on 'multiple' sequence comparison and 'maximum likelihood'. ...
between those matches to estimate the number of substitutions per position between two input sequences. ...
doi:10.1007/978-3-030-00834-5_13
fatcat:eog5wn6e55etjfyc36bi4c344m
'Multi-SpaM': a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees
2019
NAR Genomics and Bioinformatics
Most alignment-free methods calculate 'pairwise' distances between nucleic-acid or protein sequences; these distance values can then be used as input for tree-reconstruction programs such as neighbor-joining ...
In this paper, we propose the first word-based phylogeny approach that is based on 'multiple' sequence comparison and 'maximum likelihood'. ...
between those matches to estimate the number of substitutions per position between two input sequences. ...
doi:10.1093/nargab/lqz013
pmid:33575565
pmcid:PMC7671388
fatcat:eu2nciwdsfcwjm55ktan7htgdq
Alignment-free sequence comparison with spaced k-mers
2013
German Conference on Bioinformatics
In addition, distances calculated with spaced k-mers appear to be statistically more stable than distances based on contiguous k-mers. ACM Subject Classification J.3 ...
Alignment-free methods are increasingly used for genome analysis and phylogeny reconstruction since they circumvent various difficulties of traditional approaches that rely on multiple sequence alignments ...
to estimate the global degree of similarity between sequences by comparing their spaced k-mer composition. ...
doi:10.4230/oasics.gcb.2013.24
dblp:conf/gcb/BodenSHLLM13
fatcat:6zaftkj34fdttizimipr45srlm
Benchmarking of alignment-free sequence comparison methods
2019
Genome Biology
Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. ...
Acknowledgements We thank Svenja Dörrer for providing benchmarking data sets of unassembled sequencing reads. We also thank the tools' developers B. Haubold, F. Klötzl, P. Kolekar, S. Mirarab, S. ...
Availability of data and materials All data sets and results discussed in the paper are freely available from our website (http://afproject.org) through the download page (http://afproject. org/download ...
doi:10.1186/s13059-019-1755-7
pmid:31345254
pmcid:PMC6659240
fatcat:gvkqadjhfngihklqouyqhxc65y
Anchor points for genome alignment based on Filtered Spaced Word Matches
[article]
2017
arXiv
pre-print
Herein, we propose to use Filtered Spaced Word Matches to calculate anchor points for genome alignment. ...
Alignment of large genomic sequences is a fundamental task in computational genome analysis. ...
In a previous paper, we used spaced-word matches to estimate phylogenetic distances between genomic sequences [34] . ...
arXiv:1703.08792v1
fatcat:dzmqjby2bva2lkuzqfkqilo4ja
« Previous
Showing results 1 — 15 out of 7,747 results