Compression and fast retrieval of SNP data

F. Sambo, B. Di Camillo, G. Toffolo, C. Cobelli
<span title="2014-07-26">2014</span> <i title="Oxford University Press (OUP)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/wmo54ba2jnemdingjj4fl3736a" style="color: black;">Bioinformatics</a> </i> &nbsp;
Motivation: The increasing interest in rare genetic variants and epistatic genetic effects on complex phenotypic traits is currently pushing genome-wide association study design towards datasets of increasing size, both in the number of studied subjects and in the number of genotyped single nucleotide polymorphisms (SNPs). This, in turn, is leading to a compelling need for new methods for compression and fast retrieval of SNP data. Results: We present a novel algorithm and file format for
more &raquo; ... ssing and retrieving SNP data, specifically designed for large-scale association studies. Our algorithm is based on two main ideas: (i) compress linkage disequilibrium blocks in terms of differences with a reference SNP and (ii) compress reference SNPs exploiting information on their call rate and minor allele frequency. Tested on two SNP datasets and compared with several state-of-the-art software tools, our compression algorithm is shown to be competitive in terms of compression rate and to outperform all tools in terms of time to load compressed data. Availability and implementation: Our compression and decompression algorithms are implemented in a C++ library, are released under the GNU General Public License and are freely downloadable from
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1093/bioinformatics/btu495">doi:10.1093/bioinformatics/btu495</a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pubmed/25064564">pmid:25064564</a> <a target="_blank" rel="external noopener" href="https://pubmed.ncbi.nlm.nih.gov/PMC4609015/">pmcid:PMC4609015</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/x2rzzvaf3ze25hidwqml5oqn7a">fatcat:x2rzzvaf3ze25hidwqml5oqn7a</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191028125334/http://europepmc.org/backend/ptpmcrender.fcgi?accid=PMC4609015&amp;blobtype=pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/5b/62/5b6201023fe8e89ea0feec741fa14618231ac9dc.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1093/bioinformatics/btu495"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> oup.com </button> </a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4609015" title="pubmed link"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> pubmed.gov </button> </a>