Filters








7,747 Hits in 4.4 sec

Estimating evolutionary distances between genomic sequences from spaced-word matches

Burkhard Morgenstern, Bingyao Zhu, Sebastian Horwege, Chris André Leimeister
2015 Algorithms for Molecular Biology  
Estimating evolutionary distances between genomic sequences from spaced-word matches.  ...  Herein, we propose a simple estimator d N of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them.  ...  We thank Matteo Comin, Ruth Kantorovitz, Saurabh Sinha and an unknown WABI reviewer for pointing out an an error regarding the covariance of spaced-word matches in the previous version of this manuscript  ... 
doi:10.1186/s13015-015-0032-x pmid:25685176 pmcid:PMC4327811 fatcat:4yadekvijrhk3fwfdiskkwgeky

Read-SpaM: assembly-free and alignment-free comparison of bacterial genomes with low sequencing coverage [article]

Anna Katharina Lau, Chris-Andre Leimeister, Burkhard Morgenstern
2019 bioRxiv   pre-print
Test runs on simulated reads from bacterial genomes show that our approach can estimate phylogenetic distances with high accuracy, even for large evolutionary distances and for very low sequencing coverage  ...  In many fields of biomedical research, it is important to estimate phylogenetic distances between taxa based on low-coverage sequencing reads.  ...  In both cases, all spaced-word matches between the reads and the genomes or between the reads from the first taxon and the reads from the second taxon are identified and used to estimate the Jukes-Cantor  ... 
doi:10.1101/550632 fatcat:zjiaeur3uzchbdvvmiv7d5u5by

Read-SpaM: assembly-free and alignment-free comparison of bacterial genomes with low sequencing coverage

Anna-Katharina Lau, Svenja Dörrer, Chris-André Leimeister, Christoph Bleidorn, Burkhard Morgenstern
2019 BMC Bioinformatics  
Test runs on simulated reads from semi-artificial and real-world bacterial genomes show that our approach can estimate phylogenetic distances with high accuracy, even for large evolutionary distances and  ...  In many fields of biomedical research, it is important to estimate phylogenetic distances between taxa based on low-coverage sequencing reads.  ...  If the sequencing coverage is too low and/or the evolutionary distance between the compared sequences is too large, it happens that no spaced-word or k-mer matches are found, and the distance between the  ... 
doi:10.1186/s12859-019-3205-7 pmid:31842735 pmcid:PMC6916211 fatcat:s2ea23nqgbhbdazsdwb7666xli

Fast and accurate phylogeny reconstruction using filtered spaced-word matches

Chris-André Leimeister, Salma Sohrabi-Jahromi, Burkhard Morgenstern
2017 Bioinformatics  
Results: We propose Filtered Spaced Word Matches (FSWM), a fast alignment-free approach to estimate phylogenetic distances between large genomic sequences.  ...  To reduce the noise from spurious random matches, we use a filtering procedure where we discard all spaced-word matches for which the overall similarity between the aligned segments is below a threshold  ...  After a set of spaced-word matches has been selected for a pair of genomic sequences as described, we estimate the evolutionary distance between the sequences by considering all don't-care positions of  ... 
doi:10.1093/bioinformatics/btw776 pmid:28073754 pmcid:PMC5409309 fatcat:zyf4hvk43jeirf2a2bv5uxzj74

Sequence Comparison without Alignment: The SpaM approaches [article]

Burkhard Morgenstern
2019 bioRxiv   pre-print
Unlike most previous alignment-free approaches, our approaches are able to accurately estimate phylogenetic distances between DNA or protein sequences based on stochastic models of molecular evolution.  ...  Our approaches are based on spaced word matches ('SpaM'), i.e. on inexact word matches, that are allowed to contain mismatches at certain pre-defined positions.  ...  To estimate the evolutionary distance between two protein sequences, we consider the pairs of amino acids aligned to each other at the don't-care positions of the selected spaced-word matches, and we are  ... 
doi:10.1101/2019.12.16.878314 fatcat:5fufoujm7nejjnlylukc3gcyxe

Prot-SpaM: Fast alignment-free phylogeny reconstruction based on whole-proteome sequences [article]

Chris-Andre Leimeister, Jendrik Schellhorn, Svenja Schöbel, Michael Gerth, Christoph Bleidorn, Burkhard Morgenstern
2018 bioRxiv   pre-print
One of these approaches is Filtered Spaced Word Matches.  ...  Herein, we extend this approach to estimate evolutionary distances between complete or incomplete proteomes; our implementation of this approach is called Prot-SpaM.  ...  is a spaced-word match involving one word from each of the two sequences.  ... 
doi:10.1101/306142 fatcat:5jpp7ae53zhd3dzc3ojzge6eeq

Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences

Chris-Andre Leimeister, Jendrik Schellhorn, Svenja Dörrer, Michael Gerth, Christoph Bleidorn, Burkhard Morgenstern
2018 GigaScience  
One of these approaches is Filtered Spaced Word Matches.  ...  Here, we extend this approach to estimate evolutionary distances between complete or incomplete proteomes; our implementation of this approach is called Prot-SpaM.  ...  Such spaced-word matches can be rapidly identified and, after discarding random background matches, the remaining "homologous" spaced-word matches can be used to estimate the phylogenetic distance between  ... 
doi:10.1093/gigascience/giy148 pmid:30535314 pmcid:PMC6436989 fatcat:4t2w5qwat5cjrbrbc7ph5eota4

The number of k-mer matches between two DNA sequences as a function of k and applications to estimate phylogenetic distances

Sophie Röhling, Alexander Linne, Jendrik Schellhorn, Morteza Hosseini, Thomas Dencker, Burkhard Morgenstern, Aaron E. Darling
2020 PLoS ONE  
We show that the Jukes-Cantor distance between two genome sequences-i.e. the number of substitutions per site that occurred since they evolved from their last common ancestor-can be estimated from the  ...  We study the number Nk of length-k word matches between pairs of evolutionarily related DNA sequences, as a function of k.  ...  on an earlier version of this manuscript, Andrzej Zielezinski and Wojchiech Karlowski for making the AFproject server available, Christoph Bleidorn and Micha Gerth for discussions about the Wolbachia genomes  ... 
doi:10.1371/journal.pone.0228070 pmid:32040534 pmcid:PMC7010260 fatcat:kbdrrho6cnfrvbbnv4rpyulaii

Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches

Sebastian Horwege, Sebastian Lindner, Marcus Boden, Klas Hatje, Martin Kollmar, Chris-André Leimeister, Burkhard Morgenstern
2014 Nucleic Acids Research  
Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. By contrast, our new algorithms are based on inexact word matches.  ...  Various distance measures can then be defined on sequences based on their different spaced-word composition.  ...  The user can chose between the Euclidean distance and the Jensen-Shannon distance to estimate distances between the input sequences based on their spaced-word frequencies. Figure 2.  ... 
doi:10.1093/nar/gku398 pmid:24829447 pmcid:PMC4086093 fatcat:p56sepl3tfghfjod7nri227bqy

Multi-SpaM: a Maximum-Likelihood approach to Phylogeny reconstruction based on Multiple Spaced-Word Matches [article]

Thomas Dencker, Chris-Andre Leimeister, Michael Gerth, Christoph Bleidorn, Sagi Snir, Burkhard Morgenstern
2018 arXiv   pre-print
Most of these methods calculate pairwise distances for a set of input sequences, for example from word frequencies, from so-called spaced-word matches or from the average length of common substrings.  ...  Results: In this paper, we propose the first word-based approach to tree reconstruction that is based on multiple sequence comparison and Maximum Likelihood.  ...  Distance methods, by contrast, infer phylogenies by estimating evolutionary distances for all pairs of input taxa [16] .  ... 
arXiv:1803.09222v2 fatcat:ucsh6iizjndojewwnuvkapzj5e

Multi-SpaM: A Maximum-Likelihood Approach to Phylogeny Reconstruction Using Multiple Spaced-Word Matches and Quartet Trees [chapter]

Thomas Dencker, Chris-André Leimeister, Michael Gerth, Christoph Bleidorn, Sagi Snir, Burkhard Morgenstern
2018 Lecture Notes in Computer Science  
Most alignment-free methods calculate 'pairwise' distances between nucleicacid or protein sequences; these distance values can then be used as input for tree-reconstruction programs such as neighbor-joining  ...  In this paper, we propose the first word-based phylogeny approach that is based on 'multiple' sequence comparison and 'maximum likelihood'.  ...  between those matches to estimate the number of substitutions per position between two input sequences.  ... 
doi:10.1007/978-3-030-00834-5_13 fatcat:eog5wn6e55etjfyc36bi4c344m

'Multi-SpaM': a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees

Thomas Dencker, Chris-André Leimeister, Michael Gerth, Christoph Bleidorn, Sagi Snir, Burkhard Morgenstern
2019 NAR Genomics and Bioinformatics  
Most alignment-free methods calculate 'pairwise' distances between nucleic-acid or protein sequences; these distance values can then be used as input for tree-reconstruction programs such as neighbor-joining  ...  In this paper, we propose the first word-based phylogeny approach that is based on 'multiple' sequence comparison and 'maximum likelihood'.  ...  between those matches to estimate the number of substitutions per position between two input sequences.  ... 
doi:10.1093/nargab/lqz013 pmid:33575565 pmcid:PMC7671388 fatcat:eu2nciwdsfcwjm55ktan7htgdq

Alignment-free sequence comparison with spaced k-mers

Marcus Boden, Martin Schöneich, Sebastian Horwege, Sebastian Lindner, Chris Leimeister, Burkhard Morgenstern, Marc Herbstritt
2013 German Conference on Bioinformatics  
In addition, distances calculated with spaced k-mers appear to be statistically more stable than distances based on contiguous k-mers. ACM Subject Classification J.3  ...  Alignment-free methods are increasingly used for genome analysis and phylogeny reconstruction since they circumvent various difficulties of traditional approaches that rely on multiple sequence alignments  ...  to estimate the global degree of similarity between sequences by comparing their spaced k-mer composition.  ... 
doi:10.4230/oasics.gcb.2013.24 dblp:conf/gcb/BodenSHLLM13 fatcat:6zaftkj34fdttizimipr45srlm

Benchmarking of alignment-free sequence comparison methods

Andrzej Zielezinski, Hani Z. Girgis, Guillaume Bernard, Chris-Andre Leimeister, Kujin Tang, Thomas Dencker, Anna Katharina Lau, Sophie Röhling, Jae Jin Choi, Michael S. Waterman, Matteo Comin, Sung-Hou Kim (+7 others)
2019 Genome Biology  
Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications.  ...  Acknowledgements We thank Svenja Dörrer for providing benchmarking data sets of unassembled sequencing reads. We also thank the tools' developers B. Haubold, F. Klötzl, P. Kolekar, S. Mirarab, S.  ...  Availability of data and materials All data sets and results discussed in the paper are freely available from our website (http://afproject.org) through the download page (http://afproject. org/download  ... 
doi:10.1186/s13059-019-1755-7 pmid:31345254 pmcid:PMC6659240 fatcat:gvkqadjhfngihklqouyqhxc65y

Anchor points for genome alignment based on Filtered Spaced Word Matches [article]

Chris-Andre Leimeister, Thomas Dencker, Burkhard Morgenstern
2017 arXiv   pre-print
Herein, we propose to use Filtered Spaced Word Matches to calculate anchor points for genome alignment.  ...  Alignment of large genomic sequences is a fundamental task in computational genome analysis.  ...  In a previous paper, we used spaced-word matches to estimate phylogenetic distances between genomic sequences [34] .  ... 
arXiv:1703.08792v1 fatcat:dzmqjby2bva2lkuzqfkqilo4ja
« Previous Showing results 1 — 15 out of 7,747 results