The parallelism motifs of genomic data analysis

Katherine Yelick, Aydın Buluç, Muaaz Awan, Ariful Azad, Benjamin Brock, Rob Egan, Saliya Ekanayake, Marquita Ellis, Evangelos Georganas, Giulia Guidi, Steven Hofmeyr, Oguz Selvitopi (+2 others)
<span title="2020-01-20">2020</span> <i title="The Royal Society"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ercgg4vn2fenngurcnadfzdfri" style="color: black;">Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences</a> </i> &nbsp;
Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place
more &raquo; ... erent requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or 'motifs' that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1098/rsta.2019.0394">doi:10.1098/rsta.2019.0394</a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pubmed/31955674">pmid:31955674</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/kzujmq5u2refvhoovtb2ap5vha">fatcat:kzujmq5u2refvhoovtb2ap5vha</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200321140909/https://arxiv.org/pdf/2001.06989v1.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1098/rsta.2019.0394"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>