184 Hits in 3.2 sec

MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees

Suzanne J Matthews, Tiffani L Williams
2010 BMC Bioinformatics  
The problem of interest is generating the all-to-all Robinson-Foulds distance matrix, which has many applications for visualizing and clustering large collections of evolutionary trees.  ...  MapReduce is a parallel framework that has been used effectively to design largescale parallel applications for large computing clusters.  ...  Acknowledgements We would like to thank Matthew Gitzendanner, Paul Lewis, and David Soltis for providing us with the Bayesian tree collections used in this paper.  ... 
doi:10.1186/1471-2105-11-s1-s15 pmid:20122186 pmcid:PMC3009486 fatcat:azp2oflnezdljldlvjf7473hqe

Nephele: genotyping via complete composition vectors and MapReduce

Marc E Colosimo, Matthew W Peterson, Scott Mardis, Lynette Hirschman
2011 Source Code for Biology and Medicine  
Nephele can use the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes.  ...  Conclusions: We conclude that using Nephele can substantially decrease the processing time required for generating genotype trees of tens to hundreds of organisms at genome scale sequence coverage.  ...  In fact, there has been much recent research into new methods to visualize phylogenetic trees with large numbers of leaves [32, 33] .  ... 
doi:10.1186/1751-0473-6-13 pmid:21851626 pmcid:PMC3182884 fatcat:ecl2icluhbhttposjv3c4goot4

HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing [article]

Shixiang Wan, Quan Zou
2017 arXiv   pre-print
After comparing with most available state-of-the-art methods, our experimental results indicate the following: 1) HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large  ...  Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic  ...  Acknowledgments The work was supported by the Natural Science Foundation of China (No. 61370010).  ... 
arXiv:1704.00878v1 fatcat:ojszon3mzfetjiauzafvrb52qy

Applications and Algorithms for Inference of Huge Phylogenetic Trees: a Review

Muhammad Sardaraz, Muhammad Tahir, Tahir Aziz Ikram, Hassan Bajwa
2012 American Journal of Bioinformatics Research  
Phylogenetics enables us to use various techniques to extract evolutionary relationships from sequence analysis.  ...  the inference of phylogenetic trees.  ...  The package has a collection of sophisticated models used for sequence evolution.  ... 
doi:10.5923/j.bioinformatics.20120201.04 fatcat:cwskdp7b6ze6xdlq5qekgypjwm

Open Reading Frame Phylogenetic Analysis on the Cloud

Che-Lun Hung, Chun-Yuan Lin
2013 International Journal of Genomics  
The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice.  ...  These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity.  ...  Acknowledgment Part of this work was supported by the National Science Council under Grants NSC-99-2632-E-126-001-MY3 and NSC-100-2221-E-126-007-MY3.  ... 
doi:10.1155/2013/614923 pmid:23671843 pmcid:PMC3647537 fatcat:2jxf7dpnlvdjferxw3rsees7tm

HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing

Shixiang Wan, Quan Zou
2017 Algorithms for Molecular Biology  
HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences.  ...  Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic  ...  Comparison with state-of-the-art tools We select a series of state-of-the-art tools to compare with HAlign-II and evaluate its performance on addressing ultra-large datasets.  ... 
doi:10.1186/s13015-017-0116-x pmid:29026435 pmcid:PMC5622559 fatcat:bbmuyxddnfemxjgsilb3uap54u

pmTM-align: scalable pairwise and multiple structure alignment with Apache Spark and OpenMP

Weiya Chen, Chun Yao, Yingzhong Guo, Yan Wang, Zhidong Xue
2020 BMC Bioinformatics  
The Spark-based structure alignments achieved near ideal scalability with large datasets, and the OpenMP-based construction of the phylogenetic tree accelerated the incremental alignment of multiple structures  ...  stages to handle pairwise structure alignments with Spark and the phylogenetic tree-based multiple structure alignment task on a single computer with OpenMP.  ...  So here a binary tree is used instead to store the generated phylogenetic tree, and the index of each node is kept to indicate whether it's a leaf node or not (non-leaf nodes have an index larger than  ... 
doi:10.1186/s12859-020-03757-2 pmid:32993484 pmcid:PMC7526426 fatcat:go2km4ddebblhmkpw5fyq65q6y

Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient

Arlin Stoltzfus, Hilmar Lapp, Naim Matasci, Helena Deus, Brian Sidlauskas, Christian M Zmasek, Gaurav Vaidya, Enrico Pontelli, Karen Cranston, Rutger Vos, Campbell O Webb, Luke J Harmon (+16 others)
2013 BMC Bioinformatics  
A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa.  ...  Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL).  ...  In a small sample of just 40 phylogeny-relevant research articles, the authors found that 6 of the studies re-used large trees, 4 of them using the software called Phylomatic [8] to perform pruning and  ... 
doi:10.1186/1471-2105-14-158 pmid:23668630 pmcid:PMC3669619 fatcat:5gj4e7u4vjbkrcbrappwoqq3aq

Integration of Clustering and Multidimensional Scaling to Determine Phylogenetic Trees as Spherical Phylograms Visualized in 3 Dimensions

Yang Ruan, Geoffrey L. House, Saliya Ekanayake, Ursel Schutte, James D. Bever, Haixu Tang, Geoffrey Fox
2014 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing  
In the experiments, we used the sum of branch lengths to quantify the general fit between the clustering and the phylogenetic tree in SP and Mantel tests to determine how well the same grouping of sequences  ...  Phylogenetic analysis is commonly used to analyze genetic sequence data from fungal communities, while ordination and clustering techniques commonly are used to analyze sequence data from bacterial communities  ...  Our thanks to Judy Qiu from School of Informatics and Computing for providing Twister, and system administrators from University Information Technology Services for providing the support for BigRed2.  ... 
doi:10.1109/ccgrid.2014.126 dblp:conf/ccgrid/RuanHESBTF14 fatcat:nqf3p6qqlvbwdbpsal6ae3vm5i

Large scale microbiome profiling in the cloud

2019 Bioinformatics  
However, large reference genome collections are capable of providing a more complete and accurate profile of the bacterial population in a metagenomics dataset.  ...  Current profiling tools are designed to work against a small representative collection of genomes, and do not scale very well to larger reference genome collections.  ...  Acknowledgments The authors would like to thank Eric S.  ... 
doi:10.1093/bioinformatics/btz356 pmid:31510682 pmcid:PMC6612844 fatcat:6nrieqwbprfexagctux26h3rau

Parallelizing XML data-streaming workflows via MapReduce

Daniel Zinn, Shawn Bowers, Sven Köhler, Bertram Ludäscher
2010 Journal of computer and system sciences (Print)  
Our evaluation uses the Hadoop MapReduce system as an implementation platform.  ...  These efficiency gains, together with the benefits of MapReduce (e.g., fault tolerance) make our approach ideal for executing large-scale, compute-intensive XML-based scientific workflows.  ...  Our XML Processing Pipelines are an abstract version of the COMAD idea. They also thank Jianwu Wang and the anonymous reviewers for valuable comments on an earlier draft.  ... 
doi:10.1016/j.jcss.2009.11.006 fatcat:husipfq77ngdvplxht75gdxfpy

Reconstructing evolutionary trees in parallel for massive sequences

Quan Zou, Shixiang Wan, Xiangxiang Zeng, Zhanshan Sam Ma
2017 BMC Systems Biology  
Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard.  ...  Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud).  ...  Funding Publication costs were funded by the Natural Science Foundation of China (No. 61771331).  ... 
doi:10.1186/s12918-017-0476-3 pmid:29297337 pmcid:PMC5751538 fatcat:czh7v2xwdnbsxotug5hgrxgpfq

Scalable Data Analysis of Mitochondrial DNA in the Era of High-Throughput Data Generation

Hansi Weissensteiner
2019 Figshare  
One central aspect of this thesis is the classification of mtDNA data to phylogenetic clusters, used for mtDNA quality control and for detection of contamination patterns.  ...  To process the huge amount of data, all algorithms were parallelized within this work.  ...  Acknowledgements This present work would not have been possible without the support of Dr. Sebastian Schönherr and Lukas Forer PhD, thanks guys!  ... 
doi:10.6084/m9.figshare.8937899 fatcat:7g5oihrq2vekhb2d3apehiv4sm

PPCAS: Implementation of a Probabilistic Pairwise Model for Consistency-Based Multiple Alignment in Apache Spark [chapter]

Jordi Lladós, Fernando Guirado, Fernando Cores
2017 Lecture Notes in Computer Science  
PPCAS is based on the MapReduce processing paradigm in order to enable large datasets to be processed with the aim of improving the performance and scalability of the original algorithm.  ...  Large-scale data processing techniques, currently known as Big-Data, are used to manage the huge amount of data that are generated by sequencers.  ...  In [25] , the authors developed a DNA MSA tool based on trie trees to accelerate the centre star MSA strategy. It was implemented using the MapReduce distributed framework.  ... 
doi:10.1007/978-3-319-65482-9_45 fatcat:rl4j6gwtbrhmtoe4jk2b325tzm

BLSSpeller: exhaustive comparative discovery of conservedcis-regulatory elements

Dieter De Witte, Jan Van de Velde, Dries Decap, Michiel Van Bel, Pieter Audenaert, Piet Demeester, Bart Dhoedt, Klaas Vandepoele, Jan Fostier
2015 Bioinformatics  
In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted.  ...  The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery.  ...  Acknowledgements We acknowledge the support of Ghent University (Multidisciplinary Research Partnership 'Bioinformatics: From Nucleotides to Networks') and Dries Vaneechoutte, Kenneth Hoste, Ewan Higgs  ... 
doi:10.1093/bioinformatics/btv466 pmid:26254488 pmcid:PMC4653392 fatcat:oxltl52hejbhbpugptiqlgo7k4
« Previous Showing results 1 — 15 out of 184 results