A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit <a rel="external noopener" href="https://arxiv.org/pdf/1705.07001v2.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
A High-Performance Algorithm for Identifying Frequent Items in Data Streams
[article]
<span title="2017-05-22">2017</span>
<i >
arXiv
</i>
<span class="release-stage" >pre-print</span>
Estimating frequencies of items over data streams is a common building block in streaming data measurement and analysis. Misra and Gries introduced their seminal algorithm for the problem in 1982, and the problem has since been revisited many times due its practicality and applicability. We describe a highly optimized version of Misra and Gries' algorithm that is suitable for deployment in industrial settings. Our code is made public via an open source library called DataSketches that is
<span class="external-identifiers">
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1705.07001v2">arXiv:1705.07001v2</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ntjupbz7qzdqhalimamnnjmv34">fatcat:ntjupbz7qzdqhalimamnnjmv34</a>
</span>
more »
... used by several companies and production systems. Our algorithm improves on two theoretical and practical aspects of prior work. First, it handles weighted updates in amortized constant time, a common requirement in practice. Second, it uses a simple and fast method for merging summaries that asymptotically improves on prior work even for unweighted streams. We describe experiments confirming that our algorithms are more efficient than prior proposals.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191021211633/https://arxiv.org/pdf/1705.07001v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/e9/c2/e9c2af66fa3a2f982973f44628855d58380c8b29.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1705.07001v2" title="arxiv.org access">
<button class="ui compact blue labeled icon button serp-button">
<i class="file alternate outline icon"></i>
arxiv.org
</button>
</a>