Filters








62,126 Hits in 6.2 sec

Optimal streaming and tracking distinct elements with high probability [article]

Jarosław Błasiok
2019 arXiv   pre-print
After a long line of research an optimal solution for this problem with constant probability of success, using O(1/ε^2+ n) bits of space, was given by Kane, Nelson and Woodruff in 2010.  ...  The distinct elements problem is one of the fundamental problems in streaming algorithms --- given a stream of integers in the range {1,...  ...  The author is especially grateful to Jelani Nelson for many inspiring and helpful discussions and comments.  ... 
arXiv:1804.01642v2 fatcat:i26goecddrhx3g7h56lwn65aju

Optimal streaming and tracking distinct elements with high probability [chapter]

Jarosław Błasiok
2018 Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms  
After a long line of research an optimal solution for this problem with constant probability of success, using O( 1 ε 2 + lg n) bits of space, was given by Kane, Nelson and Woodruff in 2010.  ...  We consider also the strong tracking (or continuous monitoring) variant of the distinct elements problem, where we want an algorithm which provides an approximation of the number of distinct elements seen  ...  The author is especially grateful to Jelani Nelson for many inspiring and helpful discussions and comments.  ... 
doi:10.1137/1.9781611975031.156 dblp:conf/soda/Blasiok18 fatcat:iiak7lreingindwtupl4ueygvi

Incremental maintenance of maximal cliques in a dynamic graph

Apurba Das, Michael Svendsen, Srikanta Tirthapura
2019 The VLDB journal  
Our algorithms have a provably low communication cost, and a low rate of false positives and false negatives, with a high probability.  ...  We present the first communication-efficient distributed algorithms for tracking persistent items in a data stream whose elements are partitioned across many different sites.  ...  Acknowledgments This work was funded in part by the National Science Foundation through grants 0834743 and 0831903 and through a fellowship from IBM.  ... 
doi:10.1007/s00778-019-00540-5 fatcat:lnk5zuge4bcuveklilfxmhpymy

Nearly Optimal Distinct Elements and Heavy Hitters on Sliding Windows

Vladimir Braverman, Elena Grigorescu, Harry Lang, David P. Woodruff, Samson Zhou, Michael Wagner
2018 International Workshop on Approximation Algorithms for Combinatorial Optimization  
distinct elements and p -heavy hitters that are nearly optimal in both n and .  ...  We study the distinct elements and p -heavy hitters problems in the sliding window model, where only the most recent n elements in the data stream form the underlying set.  ...  Kane, Nelson, and Woodruff [53] give an optimal algorithm, using O 1 2 + log n bits of space, for providing a (1 + )-approximation to the number of distinct elements in a data stream, with constant probability  ... 
doi:10.4230/lipics.approx-random.2018.7 dblp:conf/approx/BravermanGLWZ18 fatcat:hksvwcry7nbcjezefncnxqvzqa

Tracking the l_2 Norm with Constant Update Time

Chi-Ning Chou, Zhixian Lei, Preetum Nakkiran, Michael Wagner
2019 International Workshop on Approximation Algorithms for Combinatorial Optimization  
The previous work [Braverman-Chestnut-Ivkin-Nelson-Wang-Woodruff, PODS 2017] gave a streaming algorithm with (the optimal) space using O( −2 log(1/δ)) words and O( −2 log(1/δ)) update time to obtain an  ...  -accurate estimate with probability at least 1 − δ.  ...  ACM, 2011. 19 Daniel M Kane, Jelani Nelson, and David P Woodruff. An optimal algorithm for the distinct elements problem.  ... 
doi:10.4230/lipics.approx-random.2019.2 dblp:conf/approx/ChouLN19 fatcat:nmjyjfwzpvbn5kbanbiovjz4he

Streaming Algorithms for Robust, Real-Time Detection of DDoS Attacks

Sumit Ganguly, Minos Garofalakis, Rajeev Rastogi, Krishan Sabnani
2007 27th International Conference on Distributed Computing Systems (ICDCS '07)  
The key element of our solution is a new, hashbased synopsis data structure for network-data streams that allows us to efficiently track, in guaranteed small space and time, destination IP addresses in  ...  Our work is the first to address the problem of efficiently tracking the top distinct-source frequencies over a general stream of updates (insertions and deletions) to the set of underlying network flows  ...  (Our analysis shows that this target sample size is always reached with high probability.)  ... 
doi:10.1109/icdcs.2007.142 dblp:conf/icdcs/GangulyGRS07 fatcat:bfiipr52szddjdfyjzjdiabi4q

Nearly Optimal Distinct Elements and Heavy Hitters on Sliding Windows [article]

Vladimir Braverman, Elena Grigorescu, Harry Lang, David P. Woodruff, Samson Zhou
2018 arXiv   pre-print
distinct elements and ℓ_p-heavy hitters that are nearly optimal in both n and ϵ.  ...  We study the distinct elements and ℓ_p-heavy hitters problems in the sliding window model, where only the most recent n elements in the data stream form the underlying set.  ...  Fact 7 Φ m (t) = t 1 − 1 − 1 t m Blasiok provides an optimal algorithm for a constant factor approximation to the number of distinct elements with strong tracking.  ... 
arXiv:1805.00212v2 fatcat:e2f5uobknzhk5jz4tqynoteutm

A Framework for Adversarially Robust Streaming Algorithms [article]

Omri Ben-Eliezer and Rajesh Jayaram and David P. Woodruff and Eylon Yogev
2021 arXiv   pre-print
In this work, we show that the answer is positive for various important streaming problems in the insertion-only model, including distinct elements and more generally F_p-estimation, F_p-heavy hitters,  ...  and can react in an online manner.  ...  Acknowledgments The authors wish to thank Arnold Filtser for invaluable feedback, and the anonymous reviewers for many helpful suggestions.  ... 
arXiv:2003.14265v3 fatcat:4aiym236pff4ferums5uknwmim

Continuous sampling from distributed streams

Graham Cormode, S. Muthukrishnan, Ke Yi, Qin Zhang
2012 Journal of the ACM  
These apply to the case when we want a sample from the full streams, and to the sliding window cases of only the W most recent elements, or arrivals within the last w time units.  ...  In this paper, we present communication-efficient protocols for continuously maintaining a sample (both with and without replacement) from k distributed streams.  ...  These bounds hold with high probability.  ... 
doi:10.1145/2160158.2160163 fatcat:vsbbi5jgovccbnchyjpygubgxu

Distinct Random Sampling from a Distributed Stream

Srikanta Tirthapura
2015 2015 IEEE International Parallel and Distributed Processing Symposium  
At any point, when a query is received at the coordinator, it responds with a random sample from the set of all distinct elements observed at the different sites so far.  ...  We consider continuous maintenance of a random sample of distinct elements from a massive data stream, whose input elements are observed at multiple distributed sites that communicate via a central coordinator  ...  In contrast, a distinct random sample is equally likely to contain low frequency as well as high frequency elements.  ... 
doi:10.1109/ipdps.2015.97 dblp:conf/ipps/Tirthapura15 fatcat:bws5fvwzovfurpes2pcs62czjq

Tracking set-expression cardinalities over continuous update streams

Sumit Ganguly, Minos Garofalakis, Rajeev Rastogi
2004 The VLDB journal  
There is growing interest in algorithms for processing and querying continuous data streams (i.e., data that is seen only once in a fixed order) with limited memory resources.  ...  ask "what is the number of distinct IP source addresses seen in passing packets from both router R 1 and R 2 but not router R 3 ?".  ...  distinct elements in the union of streams A 1 and A 2 .  ... 
doi:10.1007/s00778-004-0135-3 fatcat:oh7z4prl3vcr7pkxbcouuji3a4

Streaming and sublinear approximation of entropy and information distances

Sudipto Guha, Andrew McGregor, Suresh Venkatasubramanian
2006 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm - SODA '06  
An integral part of the algorithm is an interesting use of an F0 (the number of distinct elements in a set) estimation algorithm; we also provide other results along the space/time/approximation tradeoff  ...  In this paper we design streaming and sublinear time property testing algorithms for entropy and various information theoretic distances.  ...  have one element with mass ≈ 1, and (c) we will not know Using Proposition 4.1, we can show that either estimates are sufficient for (b) and (c), or that the stream has few distinct elements, which we  ... 
doi:10.1145/1109557.1109637 fatcat:rxplmcvhijg5fluzcbcthblcdy

Tracking the ℓ_2 Norm with Constant Update Time [article]

Chi-Ning Chou, Zhixian Lei, Preetum Nakkiran
2019 arXiv   pre-print
ϵ-accurate estimate with probability at least 1-δ.  ...  The previous work [Braverman-Chestnut-Ivkin-Nelson-Wang-Woodruff, PODS 2017] gave an streaming algorithm with (the optimal) space using O(ϵ^-2log(1/δ)) words and O(ϵ^-2log(1/δ)) update time to obtain an  ...  We also thank Mitali Bafna and Jarosław Błasiok for useful discussion and thank Boaz Barak for many helpful comments on an earlier draft of this article.  ... 
arXiv:1807.06479v3 fatcat:yblwe7kl4vdd3acudfsx673ipu

The Story of HyperLogLog: How Flajolet Processed Streams with Coin Flips [article]

Jérémie O. Lumbroso
2018 arXiv   pre-print
context of the conference in honor of "Philippe Flajolet and Analytic Combinatorics."  ...  The narrative was pieced together through conversations with Philippe Flajolet during my Ph.D. thesis under his supervision, as well as several conversations with collaborators after his death.  ...  That is to say elements are sampled independently of their frequency in the stream: an element appearing a thousand times, and another appearing only once would be sampled with equal probability.  ... 
arXiv:1805.00612v2 fatcat:hz4cjp4ejrh6tjyeqsdx2tpfqy

What's Different: Distributed, Continuous Monitoring of Duplicate-Resilient Aggregates on Data Streams

G. Cormode, S. Muthukrishnan, Wei Zhuang
2006 22nd International Conference on Data Engineering (ICDE'06)  
We present accuracy guaranteed, highly communication-efficient algorithms for these aggregates that work within the time and space constraints of high speed streams.  ...  Emerging applications in sensor systems and network-wide IP traffic analysis present many technical challenges. They need distributed monitoring and continuous tracking of events.  ...  Say there are k monitored sites, and a central site that has the number of distinct elements seen by all the sites thus far.  ... 
doi:10.1109/icde.2006.173 dblp:conf/icde/CormodeMZ06 fatcat:7u55gamv7bb7nhohpodcu7zwnm
« Previous Showing results 1 — 15 out of 62,126 results