A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2008; you can also visit the original URL.
The file type is application/pdf
.
Filters
Reservoir Sampling over Memory-Limited Stream Joins
2007
International Conference on Scientific and Statistical Database Management
In this paper, we address the problem of reservoir sampling over memory-limited stream joins. ...
In stream join processing with limited memory, uniform random sampling is useful for approximate query evaluation. ...
In [31] , the problems of sampling from data streams, uniform random sampling over joins, and limited memory-limited stream joins are combined. ...
doi:10.1109/ssdbm.2007.40
dblp:conf/ssdbm/Al-KatebLW07a
fatcat:mcui33v5zza25iuk4euwfrxqw4
A stratified approach to progressive approximate joins
2008
Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08
In this paper, we are interested in the progressive and approximate processing of queries to data streams when processing is limited to main memory. ...
In particular, we study one of the main building blocks of such processing: the progressive approximate join. We devise and present several novel progressive approximate join algorithms. ...
Though [1] also studied the use of reservoir sampling over memory-limited join, the focus of the work was on how to balance between the memory allocated for join buffers and the reservoir. ...
doi:10.1145/1353343.1353414
dblp:conf/edbt/TokBL08
fatcat:elqheohsijhkflyikxfl24vovi
A stratified approach to progressive approximate joins
2008
Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08
In this paper, we are interested in the progressive and approximate processing of queries to data streams when processing is limited to main memory. ...
In particular, we study one of the main building blocks of such processing: the progressive approximate join. We devise and present several novel progressive approximate join algorithms. ...
Though [1] also studied the use of reservoir sampling over memory-limited join, the focus of the work was on how to balance between the memory allocated for join buffers and the reservoir. ...
doi:10.1145/1352431.1352502
fatcat:3g4t6j3fvrdifins74tgg2ffg4
Adaptive-Size Reservoir Sampling over Data Streams
2007
International Conference on Scientific and Statistical Database Management
Reservoir sampling is a well-known technique for sequential random sampling over data streams. Conventional reservoir sampling assumes a fixed-size reservoir. ...
This paper studies adaptivesize reservoir sampling over data streams considering two main factors: reservoir size and sample uniformity. ...
In [4] , we present a progressive reservoir join-sampling algorithm for sampling over memory-limited stream joins. ...
doi:10.1109/ssdbm.2007.29
dblp:conf/ssdbm/Al-KatebLW07
fatcat:pwn47wdxqvh6ti5gf5w5uyxw7i
Online maintenance of very large random samples
2004
Proceedings of the 2004 ACM SIGMOD international conference on Management of data - SIGMOD '04
Given a main memory buffer B large enough to hold |B| records, can we develop efficient algorithms for dynamically maintaining a massive random sample containing exactly N records from a data stream, where ...
from streaming data. ...
For example, the work on ripple joins [16] provides an excellent example of how variance can be magnified by sampling over the relational join operator. ...
doi:10.1145/1007568.1007603
dblp:conf/sigmod/JermainePA04
fatcat:b2moow5pmfh73e3estmv6bs6fe
Weighted Random Sampling over Joins
[article]
2022
arXiv
pre-print
that are urgently needed in practice, namely reduced memory footprint, streaming operation, support for selections, outer joins, semi joins and anti joins and unequal-probability sampling. ...
For such challenging (acyclic) joins, a random sample over the join result is a practical alternative to working with the oversized join result. ...
Stream (Proposed). The stream-approach implements the proposed approach from Section 3 and prioritises a stream-like access over the data and limited number of scans for acyclic join queries. ...
arXiv:2201.02670v1
fatcat:3zpqshdyujfbdgln4xflwvlhvm
Analyzing Continuous Data Streams Using Improved Stratified Sampling and Ensemble Classification
2018
International Journal of Intelligent Engineering and Systems
Hence, to effectively mine the data streams from heterogeneous sources, this work proposes Adaptive Reservoir sampling Of stream In a Time window (AdROIT) which partitions the streams in a window on time ...
The experimental results show that the AdROIT provides better classification and mining results over heterogeneous data streams. ...
Due to no variation in the statistic of sub-streams over time, initially, both the AdROIT and Chain sampling occupy same memory, i.e. nearly 30 KB. ...
doi:10.22266/ijies2018.1031.20
fatcat:nuujuah5qfgfpkizl5zxoggseu
Random Sampling for Continuous Streams with Arbitrary Updates
2007
IEEE Transactions on Knowledge and Data Engineering
Motivated by this, we develop several fully dynamic algorithms for obtaining random samples from individual relations, and from the join result of two tables. ...
success of random sampling in conventional databases. ...
Otherwise (the coin heads), a subset of the records in t ffl T 2 (all the join results produced by t) is randomly extracted and included into RS (the sample set over the join results). ...
doi:10.1109/tkde.2007.250588
fatcat:jymjz5zjevfq3liyezmoh357o4
Maintaining very large random samples using the geometric file
2007
The VLDB journal
from streaming data. ...
The algorithms are designed for streaming data, or for any environment where a large sample must be maintained online in a single pass through a data set. ...
For example, the work on ripple joins [22] provides an excellent example of how variance can be magnified by sampling over the relational join operator. ...
doi:10.1007/s00778-007-0048-z
fatcat:5miahbtsprglzctkpd7qcagf7u
Towards "Intelligent Compression" in Streams: A Biased Reservoir Sampling based Bloom Filter Approach
[article]
2011
arXiv
pre-print
sampling method with Bloom filters for deduplication in streaming scenarios. ...
Sampling based Bloom Filter,(RSBF) data structure,based on the combined concepts of reservoir sampling and Bloom filters for approximate detection of duplicates in data streams.Using detailed theoretical ...
proposes a new approach on memory-less temporal bias function based reservoir sampling for continually evolving data streams. ...
arXiv:1111.0753v1
fatcat:fimfouilhjafjnhtbkittufeyu
Towards "intelligent compression" in streams
2012
Proceedings of the 15th International Conference on Extending Database Technology - EDBT '12
To the best of our knowledge, this is the first attempt to integrate reservoir sampling method with Bloom filters for deduplication in streaming scenarios. ...
In this paper, we present a novel reservoir sampling based Bloom filter (RSBF) technique, based on the combined concepts of reservoir sampling and Bloom filters for approximate detection of duplicates ...
reservoir sampling for continually evolving data streams. ...
doi:10.1145/2247596.2247624
dblp:conf/edbt/DuttaBN12
fatcat:hpdyqkqbejdcvjkkndr6a7w7se
Advanced Bloom Filter Based Algorithms for Efficient Approximate Data De-Duplication in Streams
[article]
2012
arXiv
pre-print
We propose the Reservoir Sampling based Bloom Filter (RSBF) combining the working principle of reservoir sampling and Bloom Filters. ...
De-duplication or Intelligent Compression in streaming scenarios for approximate identification and elimination of duplicates from such unbounded data stream is a greater challenge given the real-time ...
In Reservoir sampling one maintains a reservoir of size n from the data stream. ...
arXiv:1212.3964v1
fatcat:3bkz3tu3f5gsdjtvfh6gdlyaja
Toward Predictive Failure Management for Distributed Stream Processing Systems
2008
2008 The 28th International Conference on Distributed Computing Systems
size of historical training data using reservoir sampling. ...
To achieve low-overhead online learning, we propose adaptive data stream sampling schemes to adaptively adjust measurement sampling rates based on the states of monitored components, and maintain a limited ...
limited size of historical training data using reservoir sampling. ...
doi:10.1109/icdcs.2008.34
dblp:conf/icdcs/GuPYC08
fatcat:mwysgwanb5acfil7hujyxlme3q
Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams
[article]
2021
arXiv
pre-print
In contrast to the major body of work in continual learning, data streams are processed in an online fashion, without additional task-information, and an efficient memory scheme provides robustness to ...
imbalanced data streams. ...
Concretely, the sampled batch B n equals the horizon D from data stream S and joins batch B M of equal size from memory M r , constituting B as B n ∪ B M . ...
arXiv:2009.00919v4
fatcat:xcdrovmq7rgilf3hlin7j5tnqu
Models and issues in data stream systems
2002
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '02
Furthermore, it is recognized that both approximation [30] and adaptivity [8] are key ingredients in executing queries and performing other processing (e.g., data analysis and mining) over rapid data streams ...
In addition to reviewing past work relevant to data stream systems and current projects in the area, the paper explores topics in stream query languages, new requirements and challenges in query processing ...
Acknowledgements We thank all the members of the Stanford STREAM research group for their contributions and feedback. ...
doi:10.1145/543613.543615
dblp:conf/pods/BabcockBDMW02
fatcat:avqbzp74v5buvkzqoayslqwr64
« Previous
Showing results 1 — 15 out of 5,694 results