2,057 Hits in 7.4 sec

A Review of Scalable Bioinformatics Pipelines

Bjørn Fjukstad, Lars Ailo Bongo
2017 Data Science and Engineering  
The pipelines used to implement analyses must therefore scale with respect to the resources on a single compute node, the number of nodes on a cluster, and also to cost-performance.  ...  Scalability is increasingly important for bioinformatics analysis services, since these must handle larger datasets, more jobs, and more users.  ...  Toil: TCGA RNA-Seq Reference Pipeline Toil is a workflow software to run scientific workflows on a large scale in cloud or high-performance computing (HPC) environments [22] .  ... 
doi:10.1007/s41019-017-0047-z fatcat:7wyzccy7ffhjdd46pfmljrzioy

SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision

Marek S. Wiewiórka, Antonio Messina, Alicja Pacholewska, Sergio Maffioletti, Piotr Gawrysiak, Michał J. Okoniewski
2014 Computer applications in the biosciences : CABIOS  
SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way.  ...  The Apache Hadoop-based solutions have become popular in genomics BE-CAUSE OF: their scalability in a cloud infrastructure.  ...  ACKNOWLEDGEMENTS The authors thank Christian Panse, Riccardo Murri, Tyanko Aleksiev and Martin Ryan for the valuable discussions.  ... 
doi:10.1093/bioinformatics/btu343 pmid:24845651 fatcat:csouajhitbhk3ijkblvzipgu54

An Open-Source Azure Solution for Scalable Genomics Workflows

Fan Yang-Turner, Lawrence Gripper, Jeremy Swann, Trien Do, Dona Foster, Denis Volk, Anita Ramanan, Marcus Robinson, Tim Peto, Derrick Crook
2018 2018 IEEE World Congress on Services (SERVICES)  
The solution demonstrates a generic and customizable approach to run genomic data analytics workflows on a cloud environment.  ...  We present an open-source Azure solution for running scalable genomics workflows.  ...  Running a TB WGS Pipeline We used a TB WGS pipeline previously developed using Nextflow with Singularity and ran in a High Performance Computing (HPC) environment.  ... 
doi:10.1109/services.2018.00033 dblp:conf/services/Yang-TurnerGSDF18 fatcat:vkka73yalrb3rd5ag3nrm4come

Cloud Computing Enabled Big Multi-Omics Data Analytics

Saraswati Koppad, Annappa B, Georgios V Gkoutos, Animesh Acharjee
2021 Bioinformatics and Biology Insights  
Recent innovations in computational technologies and approaches, especially in cloud computing, offer a promising, low-cost, and highly flexible solution in the bioinformatics domain.  ...  Cloud computing is rapidly proving increasingly useful in molecular modeling, omics data analytics (eg, RNA sequencing, metabolomics, or proteomics data sets), and for the integration, analysis, and interpretation  ...  Acknowledgements We would like to acknowledge the reviewers and the editor for very useful constructive feedback.  ... 
doi:10.1177/11779322211035921 fatcat:7bk7zvxvb5hurhyyu5knuvgqeq

Computational Strategies for Scalable Genomics Analysis

Lizhen Shi, Zhong Wang
2019 Genes  
genomics analysis.  ...  The revolution in next-generation DNA sequencing technologies is leading to explosive data growth in genomics, posing a significant challenge to the computing infrastructure and software algorithms for  ...  Conclusions and Future Perspectives We were only able to cover a few scaling strategies for genomics analysis in Section 2, summarized in Table 1 in the context of ease of development, robustness, scalability  ... 
doi:10.3390/genes10121017 pmid:31817630 fatcat:5clqzsy65jcw3mpojnqjznxrzm

Closha: bioinformatics workflow system for the analysis of massive sequencing data

GunHwan Ko, Pan-Gyu Kim, Jongcheol Yoon, Gukhee Han, Seong-Jin Park, Wangho Song, Byungwook Lee
2018 BMC Bioinformatics  
Conclusions: Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis.  ...  Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner.  ...  Acknowledgments The authors would like to thank the anonymous reviewers and Closha users for their time and their valuable comments.  ... 
doi:10.1186/s12859-018-2019-3 pmid:29504905 pmcid:PMC5836837 fatcat:6he72saexbbaralajcwfsxgyvu

Cloud computing applications for biomedical science: A perspective

Vivek Navale, Philip E. Bourne, Francis Ouellette
2018 PLoS Computational Biology  
Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services.  ...  For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches.  ...  Disclaimer The opinions expressed in the paper are those of the authors and do not necessarily reflect the opinions of the National Institutes of Health.  ... 
doi:10.1371/journal.pcbi.1006144 pmid:29902176 pmcid:PMC6002019 fatcat:wfvdmkpytnbfbih27knmldrcyy

STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud

Konrad J. Karczewski, Guy Haskin Fernald, Alicia R. Martin, Michael Snyder, Nicholas P. Tatonetti, Joel T. Dudley, I. King Jordan
2014 PLoS ONE  
Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience  ...  The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics.  ...  Acknowledgments We would like to acknowledge the individuals who helped in the design of the system at the BioCurious hackathon in July 2012, in particular David Dehghan for his insights on cloud computing  ... 
doi:10.1371/journal.pone.0084860 pmid:24454756 pmcid:PMC3893165 fatcat:4ugorgfg65bdxk32tm7sw45gvq

Scalability and Validation of Big Data Bioinformatics Software

Andrian Yang, Michael Troup, Joshua W.K. Ho
2017 Computational and Structural Biotechnology Journal  
We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment.  ...  Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs.  ...  Acknowledgements This work was supported in part by funds from the New South Wales Ministry of Health, a National Health and Medical Research Council/National Heart Foundation Career Development Fellowship  ... 
doi:10.1016/j.csbj.2017.07.002 pmid:28794828 pmcid:PMC5537105 fatcat:nnkrlwg35fd3hkpbg2jtosdicq

NanoSPC: a scalable, portable, cloud compatible viral nanopore metagenomic data processing pipeline

Yifei Xu, Fan Yang-Turner, Denis Volk, Derrick Crook
2020 Nucleic Acids Research  
Here we introduce NanoSPC, a scalable, portable and cloud compatible pipeline for analyzing Nanopore sequencing data.  ...  Metagenomic sequencing combined with Oxford Nanopore Technology has the potential to become a point-of-care test for infectious disease in public health and clinical settings, providing rapid diagnosis  ...  Here, we introduce NanoSPC, a scalable, portable and cloud compatible pipeline for analyzing Nanopore sequencing data.  ... 
doi:10.1093/nar/gkaa413 pmid:32442274 fatcat:6dtk4z4yqjdtjnocm3xnjozsia

A Scalable Pipeline for Transcriptome Profiling Tasks with On-Demand Computing Clouds

Shayan Shams, Nayong Kim, Xiandong Meng, Ming Tai Ha, Shantenu Jha, Zhong Wang, Joohyun Kim
2016 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
method for large-scale RNA-seq data analysis, particularly maximizing merits of Infrastructure as a Service (IaaS) clouds.  ...  Major development mechanisms, designed in order to achieve the required scalability, in particular, targeting cloud environments with on-demand computing, are presented.  ...  ACKNOWLEDGMENT We are thankful for Amazon EC2 computing time with the AWS research grant program. We thank Colin Dewey and his group member for helping us for the use of DETONATE.  ... 
doi:10.1109/ipdpsw.2016.129 dblp:conf/ipps/ShamsKMHJWK16 fatcat:dh6z5bslu5btpcvk6knddcfklu

A (fire)cloud-based DNA methylation data preprocessing and quality control platform

Divy Kangeyan, Andrew Dunford, Sowmya Iyer, Chip Stewart, Megan Hanna, Gad Getz, Martin J. Aryee
2019 BMC Bioinformatics  
Results: We present a set of preprocessing pipelines for bisulfite sequencing DNA methylation data that include a new R/Bioconductor package, scmeth, for a series of efficient QC analyses of large datasets  ...  These pipelines are designed to allow users to 1) ensure reproducibility of analyses, 2) achieve scalability to large whole genome datasets with 100 GB+ of raw data per sample and to single-cell datasets  ...  Acknowledgements We would like to thank Chet Birger, Gordon Saksena and Tiffany Miller for assistance with the Firecloud implementation of workflows.  ... 
doi:10.1186/s12859-019-2750-4 fatcat:thscp52dmfdbnlv6mb36xcmz2a

The Cancer Genomics Cloud: Collaborative, Reproducible, and Democratized—A New Paradigm in Large-Scale Computational Research

Jessica W. Lau, Erik Lehnert, Anurag Sethi, Raunaq Malhotra, Gaurav Kaushik, Zeynep Onder, Nick Groves-Kirkby, Aleksandar Mihajlovic, Jack DiGiovanna, Mladen Srdic, Dragan Bajcic, Jelena Radenkovic (+8 others)
2017 Cancer Research  
By colocalizing these resources in the cloud, the CGC enables scalable, reproducible analyses. Researchers worldwide can use the CGC to investigate key questions in cancer genomics.  ...  Data of interest can be immediately analyzed in the cloud using more than 200 preinstalled, curated bioinformatics tools and workflows.  ...  Grant Support The Cancer Genomics Cloud is powered by Seven Bridges and has been funded in whole or in part with federal funds from the NCI, NIH, Department of Health and Human Services, under contract  ... 
doi:10.1158/0008-5472.can-17-0387 pmid:29092927 fatcat:fdpfepbfunaz7o2k2x3gn2wzky

Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes

Suyash S. Shringarpure, Andrew Carroll, Francisco M. De La Vega, Carlos D. Bustamante, Lars Kaderali
2015 PLoS ONE  
Here, we describe a cloud-based pipeline for joint variant calling in large samples using the Real Time Genomics population caller.  ...  Analysis of cost dependence and running time on the data size suggests that, given near linear scalability, cloud computing can be a cheap and efficient platform for analyzing even larger sequencing studies  ...  Acknowledgments The authors would like to acknowledge Len Trigg and Brian Hilbush (Real Time Genomics, Inc.) for helpful suggestions and Katie Kanagawa for comments on the manuscript.  ... 
doi:10.1371/journal.pone.0129277 pmid:26110529 pmcid:PMC4482534 fatcat:jc6obwhltfefndom2wa5mbt4tm

OUP accepted manuscript

2017 Briefings in Bioinformatics  
Rezaul Karim is a PhD researcher at Semantics in eHealth and Life Sciences with expertise in the development of computational resources for the analysis and visualization of ribosome profiling (RiboSeq  ...  Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and highthroughput capabilities onto  ...  and João Bosco Jares for helping them in drawing the Figure 2 .  ... 
doi:10.1093/bib/bbx039 pmid:28419324 pmcid:PMC6169675 fatcat:xsdfepwqmnb7pgkmnep2ifn2wi
« Previous Showing results 1 — 15 out of 2,057 results