A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
A parallel computational framework for ultra-large-scale sequence clustering analysis
2018
Bioinformatics
We implemented the proposed method on Apache Spark V2.0.2 by using the Scala programming language V2.11.8. Apache Spark is a fast and general engine for large-scale data processing, which provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. It can run on Hadoop, Mesos, standalone, or in the cloud, and can access diverse data sources including HDFS, Cassandra, HBase and S3. Most existing parallel de novo OTU picking methods utilized message
doi:10.1093/bioinformatics/bty617
pmid:30010718
fatcat:xtc22y4jrreavjvzwovu244nmy