Filters








84,119 Hits in 5.3 sec

Comparing data streams using hamming norms (how to zero in)

G. Cormode, M. Datar, P. Indyk, S. Muthukrishnan
2003 IEEE Transactions on Knowledge and Data Engineering  
When applied to a single stream, the Hamming norm gives the number of distinct items that are present in that data stream, which is a statistic of great interest in databases.  ...  When applied to a pair of streams, the Hamming norm gives an important measure of (dis)similarity: the number of unequal item counts in the two streams.  ...  Maintaining Distinct Elements Estimates There have been two main styles of approach to counting distinct elements: these are sampling based, and synopsis based.  ... 
doi:10.1109/tkde.2003.1198388 fatcat:7woadktsxrc33ifdiobaq4esfu

Comparing Data Streams Using Hamming Norms (How to Zero In) [chapter]

Graham Cormode, Mayur Datar, Piotr Indyk, S. Muthukrishnan
2002 VLDB '02: Proceedings of the 28th International Conference on Very Large Databases  
When applied to a single stream, the Hamming norm gives the number of distinct items that are present in that data stream, which is a statistic of great interest in databases.  ...  When applied to a pair of streams, the Hamming norm gives an important measure of (dis)similarity: the number of unequal item counts in the two streams.  ...  Maintaining Distinct Elements Estimates There have been two main styles of approach to counting distinct elements: these are sampling based, and synopsis based.  ... 
doi:10.1016/b978-155860869-6/50037-8 dblp:conf/vldb/CormodeDIM02 fatcat:badkccwlgjcopar6o32ivns54u

Probabilistic lossy counting

Xenofontas Dimitropoulos, Paul Hurley, Andreas Kind
2008 Computer communication review  
In this work we introduce probabilistic lossy counting (PLC), which enhances lossy counting in computing network traffic heavy hitters.  ...  In addition, PLC reduces the rate of false positives of lossy counting and achieves a low estimation error, although slightly higher than that of lossy counting.  ...  Each unique five-tuple flow is represented by a distinct data-stream element.  ... 
doi:10.1145/1341431.1341433 fatcat:i5o2rxzgrzcollnaxxvu72sfca

Feasible Sampling of Non-strict Turnstile Data Streams [article]

Neta Barkay, Ely Porat, Bar Shalem
2012 arXiv   pre-print
Our algorithms are for both Strict Turnstile data streams and the most general Non-strict Turnstile data streams, where each element may have a negative total count.  ...  We present the first feasible method for sampling a dynamic data stream with deletions, where the sample consists of pairs (k,C_k) of a value k and its exact total count C_k.  ...  In the most general data stream model, the data is a series of elements (x i , c i ) where x i is an element's value and c i is a count.  ... 
arXiv:1209.5566v1 fatcat:dp7ct3hbnvgn5fuhngdphlhp4y

Pan-private algorithms via statistics on sketches

Darakhshan Mir, S. Muthukrishnan, Aleksandar Nikolov, Rebecca N. Wright
2011 Proceedings of the 30th symposium on Principles of database systems of data - PODS '11  
We study pan-private algorithms for basic analyses, like estimating distinct count, moments, and heavy hitter count, with fully dynamic data.  ...  Consider fully dynamic data, where we track data as it gets inserted and deleted. There are well developed notions of private data analyses with dynamic data, for example, using differential privacy.  ...  Our Contributions We design the first known pan-private algorithms for distinct count, cropped first moment and heavy hitters count for fully dynamic data.  ... 
doi:10.1145/1989284.1989290 dblp:conf/pods/MirMNW11 fatcat:gmyhlyger5c3plxjlifzipjw2q

Data Streams as Random Permutations: the Distinct Element Problem

Ahmed Helmi, Jérémie Lumbroso, Conrado Martínez, Alfredo Viola
2012 Discrete Mathematics & Theoretical Computer Science  
We illustrate this by introducing RECORDINALITY, an algorithm which estimates the number of distinct elements in a stream by counting the number of $k$-records occurring in it.  ...  International audience In this paper, we show that data streams can sometimes usefully be studied as random permutations.  ...  In RECORDINALITY, we scan the data stream and use a data structure to maintain the k largest values, while keeping track of the number of times a new element is inserted in the data structure.  ... 
doi:10.46298/dmtcs.3002 fatcat:ocd2fpckl5atddgxrajhr5ta34

Statistical σ-Partition Clustering over Data Streams [chapter]

Nam Hun Park, Won Suk Lee
2003 Lecture Notes in Computer Science  
This paper proposes a grid-based clustering method that dynamically partitions the range of a grid-cell based on its distribution statistics of data elements in a data stream.  ...  When the support of data elements in a cell becomes high enough, the cell is dynamically divided into two mutually exclusive smaller cells called intermediate cells by assuming the distribution of data  ...  This paper proposes a grid-based clustering method that dynamically partitions the range of a grid-cell based on its distribution statistics of data elements in a data stream.  ... 
doi:10.1007/978-3-540-39804-2_35 fatcat:wtxltlhvoja53j3b36wi5eyisa

Incremental maintenance of maximal cliques in a dynamic graph

Apurba Das, Michael Svendsen, Srikanta Tirthapura
2019 The VLDB journal  
We present the first communication-efficient distributed algorithms for tracking persistent items in a data stream whose elements are partitioned across many different sites.  ...  While it is important to track persistent items in an online manner, it is challenging to zero-in on such items in a massive distributed stream.  ...  Acknowledgments This work was funded in part by the National Science Foundation through grants 0834743 and 0831903 and through a fellowship from IBM.  ... 
doi:10.1007/s00778-019-00540-5 fatcat:lnk5zuge4bcuveklilfxmhpymy

Framework for High Utility Pattern Mining using Dynamically Generated Minimum Support ThresholdFramework for High Utility Pattern Mining using Dynamically Generated Minimum Support Threshold

Shankar B. Naik, Jyoti D. Pawar
2018 International Journal of Engineering & Technology  
based on data in the data stream is presented.  ...  In this paper we have proposed a framework which uses high utility itemset mining to store data stream elements in a compressed form and then detect events from the sliding window.  ...  200K Transaction item count (average) 10 Distinct item count 200  ... 
doi:10.14419/ijet.v7i4.19.28276 fatcat:zvvcsrycrnf47essvqwyb4t7wm

Detection And Compensation For Disruptive Non-Linear Traffic-Flow Dynamics In Communication Networks

R.A. Carrasco, D.P.A. Greenwood
1996 Zenodo  
Publication in the conference proceedings of EUSIPCO, Trieste, Italy, 1996  ...  distinct peaks within the distribution to establish an approximation of the fundamental frequency periodicity present within the traffic stream.  ...  In this manner a moving-window distribution is produced within the vector, directly relating to the dynamic properties of the traffic stream.  ... 
doi:10.5281/zenodo.35965 fatcat:fxhcm7l5one77ho7kbxkk5lk5y

Real time analytics

Arun Kejariwal, Sanjeev Kulkarni, Karthik Ramasamy
2015 Proceedings of the VLDB Endowment  
In this regard, Forrester remarked the following in Q3 2014 [8]: "The high velocity, white-water flow of data from innumerable real-time data sources such as market data, Internet of Things, mobile, sensors  ...  ., Volume, Variety and Veracity, on Big Data streaming analytics. The tutorial is intended for both researchers and practitioners in the industry.  ...  distribution of frequencies of different elements Databases Finding Frequent Elements Identify items in a multiset with frequency more than a threshold θ Trending Hashtags Counting Inversions  ... 
doi:10.14778/2824032.2824132 fatcat:srkqipurr5hfvka5jv2mrutnuu

Summarizing Multidimensional Data Streams: A Hierarchy-Graph-Based Approach [chapter]

Yoann Pitarch, Anne Laurent, Pascal Poncelet
2010 Lecture Notes in Computer Science  
In such a dynamic context, storing the whole data stream history is unfeasible and providing a high-quality summary is required for decision makers.  ...  In this paper, we propose a summarization method for multidimensional data streams based on a graph structure and taking advantage of the data hierarchies.  ...  Moreover, thanks to dynamic lists, we overcome the major drawback of the TTW: taking into account the data distribution.  ... 
doi:10.1007/978-3-642-13672-6_33 fatcat:oc5x7gojknbcdajywiuajsllqe

An Evaluation of Streaming Algorithms for Distinct Counting Over a Sliding Window

Sneha Aman Singh, Srikanta Tirthapura
2015 Frontiers in ICT  
Counting the number of distinct elements in a data stream (distinct counting) is a fundamental aggregation task in database query processing, query optimization, and network monitoring.  ...  Counting the number of distinct elements in a data stream (distinct counting) is a fundamental aggregation task in database query processing, query optimization, and network monitoring.  ...  Linear Counting Linear Counting, due to Whang et al. (1990) , uses a bit vector B of size n = Dmax/ρ, where Dmax is an upper bound on the maximum number of distinct elements in the data stream, and ρ  ... 
doi:10.3389/fict.2015.00023 fatcat:ps5kjk4awfgppphvay5mju63om

Double-Hashing Algorithm for Frequency Estimation in Data Streams [article]

Nikita Seleznev, Senthil Kumar, C. Bayan Bruss
2022 arXiv   pre-print
In addition, we propose a procedure on how to dynamically adjust the proposed double-hashing algorithm when frequencies of the elements in a stream are changing over time.  ...  Commonly used streaming algorithms, such as count-min sketch, have many advantages, but do not take into account properties of a data stream for performance optimization.  ...  𝑠 We generated synthetic data with Zipfian distribution with for 140,000 unique stream elements.  ... 
arXiv:2204.00650v1 fatcat:mhal3uqykvel3jguqv3tmxlozm

Optimal Algorithm for Profiling Dynamic Arrays with Finite Values [article]

Dingcheng Yang, Wenjian Yu, Junhui Deng, Shenghua Liu
2018 arXiv   pre-print
It is equivalent to find the mode and top-frequent elements in a dynamic array corresponding to the log stream.  ...  However, most existing work either restrain the dynamic array within a sliding window, or do not take advantages of only one element can be added or removed in a log stream.  ...  Let m be the maximum number of distinct objects in a log stream or dynamic array A. Without loss of generality, we assume id x i ∈ [1, m], i.e. integers between 1 and m.  ... 
arXiv:1812.05306v1 fatcat:jj3gwzvoefdxlfy3k2yfesvunm
« Previous Showing results 1 — 15 out of 84,119 results