Falco: a quick and flexible single-cell RNA-seq processing framework on the cloud

Andrian Yang, Michael Troup, Peijie Lin, Joshua W. K. Ho
2016 Bioinformatics  
Single-cell RNA-seq (scRNA-seq) is increasingly used in a range of biomedical studies. Nonetheless, current RNA-seq analysis tools are not specifically designed to efficiently process scRNAseq data due to their limited scalability. Here we introduce Falco, a cloud-based framework for parallelised processing of large-scale transcriptomic data. The pipeline utilises state-of-the-art big data technology of Apache Hadoop and Apache Spark to perform massively parallel alignment, quality control, and
more » ... feature quantification of single-cell transcriptomic data in Amazon Web Service (AWS) cloud-computing environment. We have evaluated the performance of Falco using two public scRNA-seq datasets and demonstrated Falco's scalability. The result shows Falco performs at least 2.6x faster against a highly optimized single node analysis and a reduction in runtime with increasing number of computing nodes. Falco also allows user to the utilise lowcost spot instances of AWS, providing a 65% reduction in cost of analysis. Availability: Falco is available via an open source license in https://github.com
doi:10.1093/bioinformatics/btw732 pmid:28025200 fatcat:sdazrnj3dnc33hqyosil44hsza