Filters








7 Hits in 4.0 sec

HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing [article]

Shixiang Wan, Quan Zou
2017 arXiv   pre-print
Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic  ...  Distributed and parallel computing represents a crucial technique for accelerating ultra-large sequence analyses.  ...  Ultra-large biological sequence analysis can be efficiently addressed by assembling distributed and parallel computing systems with numerous cheap devices [14] [15] [16] .  ... 
arXiv:1704.00878v1 fatcat:ojszon3mzfetjiauzafvrb52qy

HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing

Shixiang Wan, Quan Zou
2017 Algorithms for Molecular Biology  
Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic  ...  Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types.  ...  Ultra-large biological sequence analysis can be efficiently addressed by assembling distributed and parallel computing systems with numerous cheap devices [15] [16] [17] .  ... 
doi:10.1186/s13015-017-0116-x pmid:29026435 pmcid:PMC5622559 fatcat:bbmuyxddnfemxjgsilb3uap54u

pmTM-align: scalable pairwise and multiple structure alignment with Apache Spark and OpenMP

Weiya Chen, Chun Yao, Yingzhong Guo, Yan Wang, Zhidong Xue
2020 BMC Bioinformatics  
With the dramatic increase of protein structure data in the Protein Data Bank, computation time quickly becomes the bottleneck for large scale structure comparisons.  ...  Conclusions pmTM-align enables scalable pairwise and multiple structure alignment computing and offers more timely responses for medium to large-sized input data than existing alignment tools such as mTM-align  ...  For the PSA stage, we decide to deploy the task on a distributed system and choose Spark as the computing framework for its efficiency and scalability against other frameworks when dealing with large dataset  ... 
doi:10.1186/s12859-020-03757-2 pmid:32993484 pmcid:PMC7526426 fatcat:go2km4ddebblhmkpw5fyq65q6y

Analyzing large scale genomic data on the cloud with Sparkhit

Liren Huang, Jan Krüger, Alexander Sczyrba, Inanc Birol
2017 Bioinformatics  
Existing tools use different distributed computational platforms to scale-out bioinformatics workloads. However, the scalability of these tools is not efficient.  ...  Motivation: The increasing amount of next-generation sequencing data poses a fundamental challenge on large scale genomic analytics.  ...  Funding This project has been supported by the German-Canadian DFG international research training group 'Computational Methods for the Analysis of the Diversity and Dynamics of Genomes' (DiDy) GRK 1906  ... 
doi:10.1093/bioinformatics/btx808 pmid:29253074 pmcid:PMC5925781 fatcat:avbzcngwmzdjxib67thpdirc2u

Cloudflow - enabling faster biomedical pipelines with MapReduce and Spark

Lukas Forer, Enis Afgan, Hansi Weissensteiner, Davor Davidovic, Guenther Specht, Florian Kronenberg, Sebastian Schoenherr
2016 Scalable Computing : Practice and Experience  
The described performance evaluation demonstrates that Spark can bring an additional boost for analysing next generation sequencing (NGS) data to the field of genetics.  ...  For many years Apache Hadoop has been used as a synonym for processing data in the MapReduce fashion.  ...  This work was, in part, supported by the "Scalable Big Data Bioinformatics Analysis in the Cloud" grant from the Croatian Ministry of Science, Education, and Sport and the Austrian Federal Ministry of  ... 
doi:10.12694/scpe.v17i2.1159 fatcat:eb2tpfewj5c55o7nahkzonbrku

The Role of Distributed Computing in Big Data Science: Case Studies in Forensics and Bioinformatics [article]

Gianluca Roscigno, Universita' Degli Studi Di Salerno, Universita' Degli Studi Di Salerno
2017
To facilitate large-scale distributed computing, many programming paradigms and frame- works have been proposed, such as MapReduce and Apache Hadoop, which transparently address some issues of distributed  ...  The era of Big Data is leading the generation of large amounts of data, which require storage and analysis capabilities that can be only ad- dressed by distributed computing systems.  ...  They have found strong evidence for the scalability and the potential use of alignment-free methods in large-scale phylogenomics.  ... 
doi:10.14273/unisa-725 fatcat:e3ysdvu32vhehef2fccltvqrxm

Towards Analyzing Computational Costs of Spark for SARS-CoV-2 Sequences Comparisons on a Commercial Cloud

Alan L. Nunes, Alba Cristina Magalhaes Alves de Melo, Cristina Boeres, Daniel de Oliveira, Lúcia Maria de Assumpção Drummond
2021 Anais do XXII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD 2021)   unpublished
In this paper, we developed a Spark application, named Diff Sequences Spark, which compares 540 SARS-CoV-2 sequences from South America in Amazon EC2 Cloud, generating as output the positions where the  ...  Regarding the markets, Diff Sequences Spark reduced the average execution times and monetary costs when using spot VMs compared to their respective on-demand VMs, even in scenarios with several spot revocations  ...  biological sequence pairwise alignment on distributed environment.  ... 
doi:10.5753/wscad.2021.18523 fatcat:mmowe74wjnd25be2sxnifmxhs4