Filters

187 Hits in 2.8 sec

Optimal Tracking of Distributed Heavy Hitters and Quantiles [article]

Ke Yi, Qin Zhang
2008 arXiv   pre-print
We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model.  ...  The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let A be a multiset of elements, drawn from the universe U={1,...,u}.  ...  Thus, our all-quantile tracking algorithm is also optimal up to a Θ(polylog 1 ǫ ) factor. Tracking the Heavy Hitters The upper bound The algorithm. Let m be the current size of A.  ...

Optimal Tracking of Distributed Heavy Hitters and Quantiles

Ke Yi, Qin Zhang
2011 Algorithmica
We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model.  ...  The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let A be a multiset of elements, drawn from the universe U = {1, . . . , u}.  ...  If randomization is allowed, simple random sampling can be used to achieve a cost of O((k + 1/ 2 ) · polylog(n, k, 1/ )) for tracking both the heavy hitters and the quantiles.  ...

Optimal tracking of distributed heavy hitters and quantiles

Ke Yi, Qin Zhang
2009 Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '09
We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model.  ...  The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let A be a multiset of elements, drawn from the universe U = {1, . . . , u}.  ...  If randomization is allowed, simple random sampling can be used to achieve a cost of O((k + 1/ 2 ) · polylog(n, k, 1/ )) for tracking both the heavy hitters and the quantiles.  ...

Rana Shahout, Roy Friedman, Ran Ben Basat
2022 Proceedings of the 15th ACM International Conference on Systems and Storage
Instead, we consider tracking the quantiles for the heavy hitters (most frequent items), which are often considered particularly important, without knowing them beforehand.  ...  In this work, we consider the problem of approximating the peritem quantiles. Elements in our stream are (ID, latency) tuples, and we wish to track the latency quantiles for each ID.  ...  To support fast and efficient tail latency tracking, several quantile sketches have been developed. Existing sketches of this type track the tail latency of an entire stream.  ...

Tracking distributed aggregates over time-based sliding windows

Graham Cormode, Ke Yi
2011 Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing - PODC '11
Specifically, we obtain optimal O( k ε log(εn/k)) communication per window of n updates for tracking counts and heavy hitters with accuracy ε across k sites; and near-optimal communication of O( k ε log  ...  The area of distributed monitoring requires tracking the value of a function of distributed data as new observations are made.  ...  We instantiate this to tracking counts (Section 3), heavy hitters (Section 4) and quantiles (Section 5) to obtain optimal or near optimal communication bounds, with low space and time costs.  ...

Tracking Distributed Aggregates over Time-Based Sliding Windows [chapter]

Graham Cormode, Ke Yi
2012 Lecture Notes in Computer Science
Specifically, we obtain optimal O( k ε log(εn/k)) communication per window of n updates for tracking counts and heavy hitters with accuracy ε across k sites; and near-optimal communication of O( k ε log  ...  The area of distributed monitoring requires tracking the value of a function of distributed data as new observations are made.  ...  We instantiate this to tracking counts (Section 3), heavy hitters (Section 4) and quantiles (Section 5) to obtain optimal or near optimal communication bounds, with low space and time costs.  ...

Holistic aggregates in a networked world

Graham Cormode, Minos Garofalakis, S. Muthukrishnan, Rajeev Rastogi
2005 Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD '05
We present the first known distributed-tracking schemes for maintaining accurate quantile estimates with provable approximation guarantees, while simultaneously optimizing the storage space at each remote  ...  While traditional database systems optimize for performance on one-shot queries, emerging large-scale monitoring applications require continuous tracking of complex aggregates and data-distribution summaries  ...  In the full version of this paper, we give more direct schemes to track heavy hitters, based on the same structure as our solution for quantiles: if every remote site tracks the heavy hitters from its  ...

Storyboard: Optimizing Precomputed Summaries for Aggregation [article]

Edward Gan, Peter Bailis, Moses Charikar
2020 arXiv   pre-print
We introduce Storyboard, a query system that optimizes item frequency and quantile summaries for accuracy when aggregating over multiple segments.  ...  An emerging class of data systems partition their data and precompute approximate summaries (i.e., sketches and samples) for each segment to reduce query costs.  ...  This research was supported in part by affiliate members and other supporters of the Stanford DAWN project-Ant Financial, Facebook, Google, Infosys, NEC, and VMware-as well as Toyota Research Institute  ...

Forward Decay: A Practical Time Decay Model for Streaming Systems

Graham Cormode, Vladislav Shkapenyuk, Divesh Srivastava, Bojian Xu
2009 Proceedings / International Conference on Data Engineering
We provide efficient algorithms to compute a variety of aggregates and draw samples under forward decay, and show that these are easy to implement scalably.  ...  Temporal data analysis in data warehouses and data streaming systems often uses time decay to reduce the importance of older tuples, without eliminating their influence, on the results of the analysis.  ...  Heavy Hitters and Quantiles For holistic aggregates such as Heavy Hitters and Quantiles, it is more complicated to find the answer to queries.  ...

Holistic UDAFs at streaming speeds

Graham Cormode, Theodore Johnson, Flip Korn, S. Muthukrishnan, Oliver Spatscheck, Divesh Srivastava
2004 Proceedings of the 2004 ACM SIGMOD international conference on Management of data - SIGMOD '04
We evaluate performance using generated and actual IP packet data, focusing on approximating quantiles and heavy hitters.  ...  Many algorithms have been proposed to approximate holistic aggregates, such as quantiles and heavy hitters, over data streams.  ...  HOLISTIC UDAF PERFORMANCE We evaluated different UDAF implementations of quantiles and heavy hitters with respect to performance, space usage and accuracy.  ...

Improving sketch reconstruction accuracy using linear least squares method

Gene Moo Lee, Huiya Liu, Young Yoon, Yin Zhang
2005 Proceedings of the 5th ACM SIGCOMM conference on Internet measurement - IMC '05
In the networking context, sketch has been applied to identifying heavy hitters and changes, which is critical for traffic monitoring, accounting, and network anomaly detection.  ...  Given a sketch and a set of keys, we estimate the values associated with these keys by constructing a linear system and finding the optimal solution for the system using linear least squares method.  ...  Related Work Common applications of sketches include detecting heavyhitters, finding quantiles, answering range/point queries and estimating flow size distribution [11] . Kumar et al.  ...

Continuous matrix approximation on distributed data

Mina Ghashami, Jeff M. Phillips, Feifei Li
2014 Proceedings of the VLDB Endowment
This paper considers the problem of "tracking approximations to a matrix" in the distributed streaming model.  ...  In this model, there are m distributed sites each observing a distinct stream of data (where each element is a row of a distributed matrix) and has a communication channel with a coordinator, and the goal  ...  Distributed Weighted Heavy Hitters We denote four protocols for tracking distributed weighted heavy hitters as P1, P2, P3 and P4 respectively.  ...

Continuous Matrix Approximation on Distributed Data [article]

Mina Ghashami, Jeff M. Phillips, Feifei Li
2014 arXiv   pre-print
This paper considers the problem of "tracking approximations to a matrix" in the distributed streaming model.  ...  In this model, there are m distributed sites each observing a distinct stream of data (where each element is a row of a distributed matrix) and has a communication channel with a coordinator, and the goal  ...  Distributed Weighted Heavy Hitters We denote four protocols for tracking distributed weighted heavy hitters as P1, P2, P3 and P4 respectively.  ...

Identifying correlated heavy-hitters in a two-dimensional data stream

Bibudh Lahiri, Arko Provo Mukherjee, Srikanta Tirthapura
2015 Data mining and knowledge discovery
, or what is the frequency distribution of items that appear along with the heavy-hitters.  ...  , or what is the frequency distribution of items that appear along with the heavy-hitters.  ...  Acknowledgements The authors were supported in part by the National Science Foundation through grants NSF CNS-0834743 and CNS-0831903.  ...

Adaptive Spatial Partitioning for Multidimensional Data Streams

John Hershberger, Nisheeth Shrivastava, Subhash Suri, Csaba D. Toth
2006 Algorithmica
For instance, we can track ε-hotspots, which are congruent boxes containing at least an ε fraction of the stream, and maintain hierarchical heavy hitters in d dimensions.  ...  Our sketch can also be viewed as a multidimensional generalization of the ε-approximate quantile summary. The space complexity of our where d is assumed to be a constant.  ...  of its heavy hitter descendants, is a heavy hitter.  ...
« Previous Showing results 1 — 15 out of 187 results