A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing
[article]
2017
arXiv
pre-print
Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic ...
Distributed and parallel computing represents a crucial technique for accelerating ultra-large sequence analyses. ...
Ultra-large biological sequence analysis can be efficiently addressed by assembling distributed and parallel computing systems with numerous cheap devices [14] [15] [16] . ...
arXiv:1704.00878v1
fatcat:ojszon3mzfetjiauzafvrb52qy
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing
2017
Algorithms for Molecular Biology
Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic ...
Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. ...
Ultra-large biological sequence analysis can be efficiently addressed by assembling distributed and parallel computing systems with numerous cheap devices [15] [16] [17] . ...
doi:10.1186/s13015-017-0116-x
pmid:29026435
pmcid:PMC5622559
fatcat:bbmuyxddnfemxjgsilb3uap54u
pmTM-align: scalable pairwise and multiple structure alignment with Apache Spark and OpenMP
2020
BMC Bioinformatics
With the dramatic increase of protein structure data in the Protein Data Bank, computation time quickly becomes the bottleneck for large scale structure comparisons. ...
Conclusions pmTM-align enables scalable pairwise and multiple structure alignment computing and offers more timely responses for medium to large-sized input data than existing alignment tools such as mTM-align ...
For the PSA stage, we decide to deploy the task on a distributed system and choose Spark as the computing framework for its efficiency and scalability against other frameworks when dealing with large dataset ...
doi:10.1186/s12859-020-03757-2
pmid:32993484
pmcid:PMC7526426
fatcat:go2km4ddebblhmkpw5fyq65q6y
Analyzing large scale genomic data on the cloud with Sparkhit
2017
Bioinformatics
Existing tools use different distributed computational platforms to scale-out bioinformatics workloads. However, the scalability of these tools is not efficient. ...
Motivation: The increasing amount of next-generation sequencing data poses a fundamental challenge on large scale genomic analytics. ...
Funding This project has been supported by the German-Canadian DFG international research training group 'Computational Methods for the Analysis of the Diversity and Dynamics of Genomes' (DiDy) GRK 1906 ...
doi:10.1093/bioinformatics/btx808
pmid:29253074
pmcid:PMC5925781
fatcat:avbzcngwmzdjxib67thpdirc2u
Cloudflow - enabling faster biomedical pipelines with MapReduce and Spark
2016
Scalable Computing : Practice and Experience
The described performance evaluation demonstrates that Spark can bring an additional boost for analysing next generation sequencing (NGS) data to the field of genetics. ...
For many years Apache Hadoop has been used as a synonym for processing data in the MapReduce fashion. ...
This work was, in part, supported by the "Scalable Big Data Bioinformatics Analysis in the Cloud" grant from the Croatian Ministry of Science, Education, and Sport and the Austrian Federal Ministry of ...
doi:10.12694/scpe.v17i2.1159
fatcat:eb2tpfewj5c55o7nahkzonbrku
The Role of Distributed Computing in Big Data Science: Case Studies in Forensics and Bioinformatics
[article]
2017
To facilitate large-scale distributed computing, many programming paradigms and frame- works have been proposed, such as MapReduce and Apache Hadoop, which transparently address some issues of distributed ...
The era of Big Data is leading the generation of large amounts of data, which require storage and analysis capabilities that can be only ad- dressed by distributed computing systems. ...
They have found strong evidence for the scalability and the potential use of alignment-free methods in large-scale phylogenomics. ...
doi:10.14273/unisa-725
fatcat:e3ysdvu32vhehef2fccltvqrxm
Towards Analyzing Computational Costs of Spark for SARS-CoV-2 Sequences Comparisons on a Commercial Cloud
2021
Anais do XXII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD 2021)
unpublished
In this paper, we developed a Spark application, named Diff Sequences Spark, which compares 540 SARS-CoV-2 sequences from South America in Amazon EC2 Cloud, generating as output the positions where the ...
Regarding the markets, Diff Sequences Spark reduced the average execution times and monetary costs when using spot VMs compared to their respective on-demand VMs, even in scenarios with several spot revocations ...
biological sequence pairwise alignment on distributed environment. ...
doi:10.5753/wscad.2021.18523
fatcat:mmowe74wjnd25be2sxnifmxhs4