A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit <a rel="external noopener" href="https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-019-2903-5">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="Springer Science and Business Media LLC">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/n5zrklrhlzhtdorf4rk4rmeo3i" style="color: black;">BMC Bioinformatics</a>
elPrep is an established multi-threaded framework for preparing SAM and BAM files in sequencing pipelines. To achieve good performance, its software architecture makes only a single pass through a SAM/BAM file for multiple preparation steps, and keeps sequencing data as much as possible in main memory. Similar to other SAM/BAM tools, management of heap memory is a complex task in elPrep, and it became a serious productivity bottleneck in its original implementation language during recent<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1186/s12859-019-2903-5">doi:10.1186/s12859-019-2903-5</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wojcekiaz5debcuo36mcczsb64">fatcat:wojcekiaz5debcuo36mcczsb64</a> </span>
more »... development of elPrep. We therefore investigated three alternative programming languages: Go and Java using a concurrent, parallel garbage collector on the one hand, and C++17 using reference counting on the other hand for handling large amounts of heap objects. We reimplemented elPrep in all three languages and benchmarked their runtime performance and memory use. Results: The Go implementation performs best, yielding the best balance between runtime performance and memory use. While the Java benchmarks report a somewhat faster runtime than the Go benchmarks, the memory use of the Java runs is significantly higher. The C++17 benchmarks run significantly slower than both Go and Java, while using somewhat more memory than the Go runs. Our analysis shows that concurrent, parallel garbage collection is better at managing a large heap of objects than reference counting in our case. Conclusions: Based on our benchmark results, we selected Go as our new implementation language for elPrep, and recommend considering Go as a good candidate for developing other bioinformatics tools for processing SAM/BAM data as well.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200214084925/https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-019-2903-5" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/c7/dd/c7ddb790a86fcffe684281a7192711f13ee188e1.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1186/s12859-019-2903-5"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> springer.com </button> </a>