Filters








5,694 Hits in 6.3 sec

Reservoir Sampling over Memory-Limited Stream Joins

Mohammed Al-Kateb, Byung Suk Lee, X. Sean Wang
2007 International Conference on Scientific and Statistical Database Management  
In this paper, we address the problem of reservoir sampling over memory-limited stream joins.  ...  In stream join processing with limited memory, uniform random sampling is useful for approximate query evaluation.  ...  In [31] , the problems of sampling from data streams, uniform random sampling over joins, and limited memory-limited stream joins are combined.  ... 
doi:10.1109/ssdbm.2007.40 dblp:conf/ssdbm/Al-KatebLW07a fatcat:mcui33v5zza25iuk4euwfrxqw4

A stratified approach to progressive approximate joins

Wee Hyong Tok, Stéphane Bressan, Mong-Li Lee
2008 Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08  
In this paper, we are interested in the progressive and approximate processing of queries to data streams when processing is limited to main memory.  ...  In particular, we study one of the main building blocks of such processing: the progressive approximate join. We devise and present several novel progressive approximate join algorithms.  ...  Though [1] also studied the use of reservoir sampling over memory-limited join, the focus of the work was on how to balance between the memory allocated for join buffers and the reservoir.  ... 
doi:10.1145/1353343.1353414 dblp:conf/edbt/TokBL08 fatcat:elqheohsijhkflyikxfl24vovi

A stratified approach to progressive approximate joins

Wee Hyong Tok, Stéphane Bressan, Mong-Li Lee
2008 Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08  
In this paper, we are interested in the progressive and approximate processing of queries to data streams when processing is limited to main memory.  ...  In particular, we study one of the main building blocks of such processing: the progressive approximate join. We devise and present several novel progressive approximate join algorithms.  ...  Though [1] also studied the use of reservoir sampling over memory-limited join, the focus of the work was on how to balance between the memory allocated for join buffers and the reservoir.  ... 
doi:10.1145/1352431.1352502 fatcat:3g4t6j3fvrdifins74tgg2ffg4

Adaptive-Size Reservoir Sampling over Data Streams

Mohammed Al-Kateb, Byung Suk Lee, X. Sean Wang
2007 International Conference on Scientific and Statistical Database Management  
Reservoir sampling is a well-known technique for sequential random sampling over data streams. Conventional reservoir sampling assumes a fixed-size reservoir.  ...  This paper studies adaptivesize reservoir sampling over data streams considering two main factors: reservoir size and sample uniformity.  ...  In [4] , we present a progressive reservoir join-sampling algorithm for sampling over memory-limited stream joins.  ... 
doi:10.1109/ssdbm.2007.29 dblp:conf/ssdbm/Al-KatebLW07 fatcat:pwn47wdxqvh6ti5gf5w5uyxw7i

Online maintenance of very large random samples

Christopher Jermaine, Abhijit Pol, Subramanian Arumugam
2004 Proceedings of the 2004 ACM SIGMOD international conference on Management of data - SIGMOD '04  
Given a main memory buffer B large enough to hold |B| records, can we develop efficient algorithms for dynamically maintaining a massive random sample containing exactly N records from a data stream, where  ...  from streaming data.  ...  For example, the work on ripple joins [16] provides an excellent example of how variance can be magnified by sampling over the relational join operator.  ... 
doi:10.1145/1007568.1007603 dblp:conf/sigmod/JermainePA04 fatcat:b2moow5pmfh73e3estmv6bs6fe

Weighted Random Sampling over Joins [article]

Michael Shekelyan, Graham Cormode, Peter Triantafillou, Ali Shanghooshabad, Qingzhi Ma
2022 arXiv   pre-print
that are urgently needed in practice, namely reduced memory footprint, streaming operation, support for selections, outer joins, semi joins and anti joins and unequal-probability sampling.  ...  For such challenging (acyclic) joins, a random sample over the join result is a practical alternative to working with the oversized join result.  ...  Stream (Proposed). The stream-approach implements the proposed approach from Section 3 and prioritises a stream-like access over the data and limited number of scans for acyclic join queries.  ... 
arXiv:2201.02670v1 fatcat:3zpqshdyujfbdgln4xflwvlhvm

Analyzing Continuous Data Streams Using Improved Stratified Sampling and Ensemble Classification

Gayathiri Kathiresan, Krishna Mohanta, Khanaa Asari
2018 International Journal of Intelligent Engineering and Systems  
Hence, to effectively mine the data streams from heterogeneous sources, this work proposes Adaptive Reservoir sampling Of stream In a Time window (AdROIT) which partitions the streams in a window on time  ...  The experimental results show that the AdROIT provides better classification and mining results over heterogeneous data streams.  ...  Due to no variation in the statistic of sub-streams over time, initially, both the AdROIT and Chain sampling occupy same memory, i.e. nearly 30 KB.  ... 
doi:10.22266/ijies2018.1031.20 fatcat:nuujuah5qfgfpkizl5zxoggseu

Random Sampling for Continuous Streams with Arbitrary Updates

Yufei Tao, Xiang Lian, Dimitris Papadias, Marios Hadjieleftheriou
2007 IEEE Transactions on Knowledge and Data Engineering  
Motivated by this, we develop several fully dynamic algorithms for obtaining random samples from individual relations, and from the join result of two tables.  ...  success of random sampling in conventional databases.  ...  Otherwise (the coin heads), a subset of the records in t ffl T 2 (all the join results produced by t) is randomly extracted and included into RS (the sample set over the join results).  ... 
doi:10.1109/tkde.2007.250588 fatcat:jymjz5zjevfq3liyezmoh357o4

Maintaining very large random samples using the geometric file

Abhijit Pol, Christopher Jermaine, Subramanian Arumugam
2007 The VLDB journal  
from streaming data.  ...  The algorithms are designed for streaming data, or for any environment where a large sample must be maintained online in a single pass through a data set.  ...  For example, the work on ripple joins [22] provides an excellent example of how variance can be magnified by sampling over the relational join operator.  ... 
doi:10.1007/s00778-007-0048-z fatcat:5miahbtsprglzctkpd7qcagf7u

Towards "Intelligent Compression" in Streams: A Biased Reservoir Sampling based Bloom Filter Approach [article]

Sourav Dutta, Souvik Bhattacherjee, Ankur Narang
2011 arXiv   pre-print
sampling method with Bloom filters for deduplication in streaming scenarios.  ...  Sampling based Bloom Filter,(RSBF) data structure,based on the combined concepts of reservoir sampling and Bloom filters for approximate detection of duplicates in data streams.Using detailed theoretical  ...  proposes a new approach on memory-less temporal bias function based reservoir sampling for continually evolving data streams.  ... 
arXiv:1111.0753v1 fatcat:fimfouilhjafjnhtbkittufeyu

Towards "intelligent compression" in streams

Sourav Dutta, Souvik Bhattacherjee, Ankur Narang
2012 Proceedings of the 15th International Conference on Extending Database Technology - EDBT '12  
To the best of our knowledge, this is the first attempt to integrate reservoir sampling method with Bloom filters for deduplication in streaming scenarios.  ...  In this paper, we present a novel reservoir sampling based Bloom filter (RSBF) technique, based on the combined concepts of reservoir sampling and Bloom filters for approximate detection of duplicates  ...  reservoir sampling for continually evolving data streams.  ... 
doi:10.1145/2247596.2247624 dblp:conf/edbt/DuttaBN12 fatcat:hpdyqkqbejdcvjkkndr6a7w7se

Advanced Bloom Filter Based Algorithms for Efficient Approximate Data De-Duplication in Streams [article]

Suman K. Bera, Sourav Dutta, Ankur Narang, Souvik Bhattacherjee
2012 arXiv   pre-print
We propose the Reservoir Sampling based Bloom Filter (RSBF) combining the working principle of reservoir sampling and Bloom Filters.  ...  De-duplication or Intelligent Compression in streaming scenarios for approximate identification and elimination of duplicates from such unbounded data stream is a greater challenge given the real-time  ...  In Reservoir sampling one maintains a reservoir of size n from the data stream.  ... 
arXiv:1212.3964v1 fatcat:3bkz3tu3f5gsdjtvfh6gdlyaja

Toward Predictive Failure Management for Distributed Stream Processing Systems

Xiaohui Gu, Spiros Papadimitriou, Philip S. Yu, Shu-Ping Chang
2008 2008 The 28th International Conference on Distributed Computing Systems  
size of historical training data using reservoir sampling.  ...  To achieve low-overhead online learning, we propose adaptive data stream sampling schemes to adaptively adjust measurement sampling rates based on the states of monitored components, and maintain a limited  ...  limited size of historical training data using reservoir sampling.  ... 
doi:10.1109/icdcs.2008.34 dblp:conf/icdcs/GuPYC08 fatcat:mwysgwanb5acfil7hujyxlme3q

Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams [article]

Matthias De Lange, Tinne Tuytelaars
2021 arXiv   pre-print
In contrast to the major body of work in continual learning, data streams are processed in an online fashion, without additional task-information, and an efficient memory scheme provides robustness to  ...  imbalanced data streams.  ...  Concretely, the sampled batch B n equals the horizon D from data stream S and joins batch B M of equal size from memory M r , constituting B as B n ∪ B M .  ... 
arXiv:2009.00919v4 fatcat:xcdrovmq7rgilf3hlin7j5tnqu

Models and issues in data stream systems

Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, Jennifer Widom
2002 Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '02  
Furthermore, it is recognized that both approximation [30] and adaptivity [8] are key ingredients in executing queries and performing other processing (e.g., data analysis and mining) over rapid data streams  ...  In addition to reviewing past work relevant to data stream systems and current projects in the area, the paper explores topics in stream query languages, new requirements and challenges in query processing  ...  Acknowledgements We thank all the members of the Stanford STREAM research group for their contributions and feedback.  ... 
doi:10.1145/543613.543615 dblp:conf/pods/BabcockBDMW02 fatcat:avqbzp74v5buvkzqoayslqwr64
« Previous Showing results 1 — 15 out of 5,694 results