Scalability and Validation of Big Data Bioinformatics Software

Andrian Yang, Michael Troup, Joshua W.K. Ho
<span title="">2017</span> <i title="Elsevier BV"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/cmflnsn2l5gylcviv447te3mku" style="color: black;">Computational and Structural Biotechnology Journal</a> </i> &nbsp;
This review examines two important aspects that are central to modern big data bioinformatics analysissoftware scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always
more &raquo; ... an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how stateof-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/j.csbj.2017.07.002">doi:10.1016/j.csbj.2017.07.002</a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pubmed/28794828">pmid:28794828</a> <a target="_blank" rel="external noopener" href="https://pubmed.ncbi.nlm.nih.gov/PMC5537105/">pmcid:PMC5537105</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/nnkrlwg35fd3hkpbg2jtosdicq">fatcat:nnkrlwg35fd3hkpbg2jtosdicq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200502073853/http://europepmc.org/backend/ptpmcrender.fcgi?accid=PMC5537105&amp;blobtype=pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/87/4d/874d890029ba976b1fde3a6470bec2016edc4982.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/j.csbj.2017.07.002"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> elsevier.com </button> </a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5537105" title="pubmed link"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> pubmed.gov </button> </a>