Filters








187 Hits in 2.8 sec

Optimal Tracking of Distributed Heavy Hitters and Quantiles [article]

Ke Yi, Qin Zhang
2008 arXiv   pre-print
We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model.  ...  The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let A be a multiset of elements, drawn from the universe U={1,...,u}.  ...  Thus, our all-quantile tracking algorithm is also optimal up to a Θ(polylog 1 ǫ ) factor. Tracking the Heavy Hitters The upper bound The algorithm. Let m be the current size of A.  ... 
arXiv:0812.0209v1 fatcat:vcneh3yuw5gtpjdoxaetecbxn4

Optimal Tracking of Distributed Heavy Hitters and Quantiles

Ke Yi, Qin Zhang
2011 Algorithmica  
We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model.  ...  The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let A be a multiset of elements, drawn from the universe U = {1, . . . , u}.  ...  If randomization is allowed, simple random sampling can be used to achieve a cost of O((k + 1/ 2 ) · polylog(n, k, 1/ )) for tracking both the heavy hitters and the quantiles.  ... 
doi:10.1007/s00453-011-9584-4 fatcat:fegemjo2gjgixksvy7ltvycdfm

Optimal tracking of distributed heavy hitters and quantiles

Ke Yi, Qin Zhang
2009 Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '09  
We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model.  ...  The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let A be a multiset of elements, drawn from the universe U = {1, . . . , u}.  ...  If randomization is allowed, simple random sampling can be used to achieve a cost of O((k + 1/ 2 ) · polylog(n, k, 1/ )) for tracking both the heavy hitters and the quantiles.  ... 
doi:10.1145/1559795.1559820 dblp:conf/pods/YiZ09 fatcat:72jziqvwjffihka55xfh35q2mi

SQUAD

Rana Shahout, Roy Friedman, Ran Ben Basat
2022 Proceedings of the 15th ACM International Conference on Systems and Storage  
Instead, we consider tracking the quantiles for the heavy hitters (most frequent items), which are often considered particularly important, without knowing them beforehand.  ...  In this work, we consider the problem of approximating the peritem quantiles. Elements in our stream are (ID, latency) tuples, and we wish to track the latency quantiles for each ID.  ...  To support fast and efficient tail latency tracking, several quantile sketches have been developed. Existing sketches of this type track the tail latency of an entire stream.  ... 
doi:10.1145/3534056.3535009 fatcat:4yyf75alxvg6pihkzik3ejaxd4

Tracking distributed aggregates over time-based sliding windows

Graham Cormode, Ke Yi
2011 Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing - PODC '11  
Specifically, we obtain optimal O( k ε log(εn/k)) communication per window of n updates for tracking counts and heavy hitters with accuracy ε across k sites; and near-optimal communication of O( k ε log  ...  The area of distributed monitoring requires tracking the value of a function of distributed data as new observations are made.  ...  We instantiate this to tracking counts (Section 3), heavy hitters (Section 4) and quantiles (Section 5) to obtain optimal or near optimal communication bounds, with low space and time costs.  ... 
doi:10.1145/1993806.1993839 dblp:conf/podc/CormodeY11 fatcat:q3y7fx4hdbdsdm74zkc75pwkd4

Tracking Distributed Aggregates over Time-Based Sliding Windows [chapter]

Graham Cormode, Ke Yi
2012 Lecture Notes in Computer Science  
Specifically, we obtain optimal O( k ε log(εn/k)) communication per window of n updates for tracking counts and heavy hitters with accuracy ε across k sites; and near-optimal communication of O( k ε log  ...  The area of distributed monitoring requires tracking the value of a function of distributed data as new observations are made.  ...  We instantiate this to tracking counts (Section 3), heavy hitters (Section 4) and quantiles (Section 5) to obtain optimal or near optimal communication bounds, with low space and time costs.  ... 
doi:10.1007/978-3-642-31235-9_28 fatcat:qm4quv37o5f2zk55yhj7hvi4sm

Holistic aggregates in a networked world

Graham Cormode, Minos Garofalakis, S. Muthukrishnan, Rajeev Rastogi
2005 Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD '05  
We present the first known distributed-tracking schemes for maintaining accurate quantile estimates with provable approximation guarantees, while simultaneously optimizing the storage space at each remote  ...  While traditional database systems optimize for performance on one-shot queries, emerging large-scale monitoring applications require continuous tracking of complex aggregates and data-distribution summaries  ...  In the full version of this paper, we give more direct schemes to track heavy hitters, based on the same structure as our solution for quantiles: if every remote site tracks the heavy hitters from its  ... 
doi:10.1145/1066157.1066161 dblp:conf/sigmod/CormodeGMR05 fatcat:4znyqjnwzjemzoxnxsvo36sdzy

Storyboard: Optimizing Precomputed Summaries for Aggregation [article]

Edward Gan, Peter Bailis, Moses Charikar
2020 arXiv   pre-print
We introduce Storyboard, a query system that optimizes item frequency and quantile summaries for accuracy when aggregating over multiple segments.  ...  An emerging class of data systems partition their data and precompute approximate summaries (i.e., sketches and samples) for each segment to reduce query costs.  ...  This research was supported in part by affiliate members and other supporters of the Stanford DAWN project-Ant Financial, Facebook, Google, Infosys, NEC, and VMware-as well as Toyota Research Institute  ... 
arXiv:2002.03063v1 fatcat:qfj25arlnbdmxdfdqq6hmf7eeu

Forward Decay: A Practical Time Decay Model for Streaming Systems

Graham Cormode, Vladislav Shkapenyuk, Divesh Srivastava, Bojian Xu
2009 Proceedings / International Conference on Data Engineering  
We provide efficient algorithms to compute a variety of aggregates and draw samples under forward decay, and show that these are easy to implement scalably.  ...  Temporal data analysis in data warehouses and data streaming systems often uses time decay to reduce the importance of older tuples, without eliminating their influence, on the results of the analysis.  ...  Heavy Hitters and Quantiles For holistic aggregates such as Heavy Hitters and Quantiles, it is more complicated to find the answer to queries.  ... 
doi:10.1109/icde.2009.65 dblp:conf/icde/CormodeSSX09 fatcat:qc4xkd4nt5hahli7eksp3hgdgm

Holistic UDAFs at streaming speeds

Graham Cormode, Theodore Johnson, Flip Korn, S. Muthukrishnan, Oliver Spatscheck, Divesh Srivastava
2004 Proceedings of the 2004 ACM SIGMOD international conference on Management of data - SIGMOD '04  
We evaluate performance using generated and actual IP packet data, focusing on approximating quantiles and heavy hitters.  ...  Many algorithms have been proposed to approximate holistic aggregates, such as quantiles and heavy hitters, over data streams.  ...  HOLISTIC UDAF PERFORMANCE We evaluated different UDAF implementations of quantiles and heavy hitters with respect to performance, space usage and accuracy.  ... 
doi:10.1145/1007568.1007575 dblp:conf/sigmod/JohnsonCKMSS04 fatcat:v7nx5yqiurc3znqliz4mnnhnoe

Improving sketch reconstruction accuracy using linear least squares method

Gene Moo Lee, Huiya Liu, Young Yoon, Yin Zhang
2005 Proceedings of the 5th ACM SIGCOMM conference on Internet measurement - IMC '05  
In the networking context, sketch has been applied to identifying heavy hitters and changes, which is critical for traffic monitoring, accounting, and network anomaly detection.  ...  Given a sketch and a set of keys, we estimate the values associated with these keys by constructing a linear system and finding the optimal solution for the system using linear least squares method.  ...  Related Work Common applications of sketches include detecting heavyhitters, finding quantiles, answering range/point queries and estimating flow size distribution [11] . Kumar et al.  ... 
doi:10.1145/1330107.1330138 fatcat:eczo22iurzd2xnxyzy7btmmvme

Continuous matrix approximation on distributed data

Mina Ghashami, Jeff M. Phillips, Feifei Li
2014 Proceedings of the VLDB Endowment  
This paper considers the problem of "tracking approximations to a matrix" in the distributed streaming model.  ...  In this model, there are m distributed sites each observing a distinct stream of data (where each element is a row of a distributed matrix) and has a communication channel with a coordinator, and the goal  ...  Distributed Weighted Heavy Hitters We denote four protocols for tracking distributed weighted heavy hitters as P1, P2, P3 and P4 respectively.  ... 
doi:10.14778/2732951.2732954 fatcat:ig2e4vhc35fwpc7l6zp3yic2qi

Continuous Matrix Approximation on Distributed Data [article]

Mina Ghashami, Jeff M. Phillips, Feifei Li
2014 arXiv   pre-print
This paper considers the problem of "tracking approximations to a matrix" in the distributed streaming model.  ...  In this model, there are m distributed sites each observing a distinct stream of data (where each element is a row of a distributed matrix) and has a communication channel with a coordinator, and the goal  ...  Distributed Weighted Heavy Hitters We denote four protocols for tracking distributed weighted heavy hitters as P1, P2, P3 and P4 respectively.  ... 
arXiv:1404.7571v1 fatcat:u3uqbbbnxzh2xaniwdznm2qmhi

Identifying correlated heavy-hitters in a two-dimensional data stream

Bibudh Lahiri, Arko Provo Mukherjee, Srikanta Tirthapura
2015 Data mining and knowledge discovery  
, or what is the frequency distribution of items that appear along with the heavy-hitters.  ...  , or what is the frequency distribution of items that appear along with the heavy-hitters.  ...  Acknowledgements The authors were supported in part by the National Science Foundation through grants NSF CNS-0834743 and CNS-0831903.  ... 
doi:10.1007/s10618-015-0438-6 fatcat:fnm7puga65h7hdcibjnufmgwom

Adaptive Spatial Partitioning for Multidimensional Data Streams

John Hershberger, Nisheeth Shrivastava, Subhash Suri, Csaba D. Toth
2006 Algorithmica  
For instance, we can track ε-hotspots, which are congruent boxes containing at least an ε fraction of the stream, and maintain hierarchical heavy hitters in d dimensions.  ...  Our sketch can also be viewed as a multidimensional generalization of the ε-approximate quantile summary. The space complexity of our where d is assumed to be a constant.  ...  of its heavy hitter descendants, is a heavy hitter.  ... 
doi:10.1007/s00453-006-0070-3 fatcat:bvtwjm4fabeabe2vzncbzcs4se
« Previous Showing results 1 — 15 out of 187 results