Algorithms for efficiently collapsing reads with Unique Molecular Identifiers

Daniel Liu
<span title="2019-12-16">2019</span> <i title="PeerJ"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/eyfkjqp7sva5bbnwatk5zazi7q" style="color: black;">PeerJ</a> </i> &nbsp;
Unique Molecular Identifiers (UMI) are used in many experiments to find and remove PCR duplicates. There are many tools for solving the problem of deduplicating reads based on their finding reads with the same alignment coordinates and UMIs. However, many tools either cannot handle substitution errors, or require expensive pairwise UMI comparisons that do not efficiently scale to larger datasets. Results We reformulate the problem of deduplicating UMIs in a manner that enables optimizations to
more &raquo; ... e made, and more efficient data structures to be used. We implement our data structures and optimizations in a tool called UMICollapse, which is able to deduplicate over one million unique UMIs of length 9 at a single alignment position in around 26 s, using only a single thread and much less than 10 GB of memory. Conclusions We present a new formulation of the UMI deduplication problem, and show that it can be solved faster, with more sophisticated data structures.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.7717/peerj.8275">doi:10.7717/peerj.8275</a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pubmed/31871845">pmid:31871845</a> <a target="_blank" rel="external noopener" href="https://pubmed.ncbi.nlm.nih.gov/PMC6921982/">pmcid:PMC6921982</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mra73difbndctmk5roisphi3hm">fatcat:mra73difbndctmk5roisphi3hm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191217084346/https://peerj.com/articles/8275.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/b8/2e/b82eb12c8ea439064d5a03773273486a6aebfd80.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.7717/peerj.8275"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6921982" title="pubmed link"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> pubmed.gov </button> </a>