A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Random Sampling for Group-By Queries
[article]
2019
arXiv
pre-print
We present CVOPT, a query- and data-driven sampling framework for a set of group-by queries. ...
We consider random sampling for answering the ubiquitous class of group-by queries, which first group data according to one or more attributes, and then aggregate within each group after filtering through ...
Given a budget of sampling M records from a table for a group-by query with r groups, how can one draw a random sample such that the accuracy is maximized? ...
arXiv:1909.02629v3
fatcat:ypwxnvhe7rgo3n4q7uwvippqbm
Random Sampling for Group-By Queries
2020
2020 IEEE 36th International Conference on Data Engineering (ICDE)
We present CVOPT, a query-and data-driven sampling framework for a set of group-by queries. ...
We consider random sampling for answering the ubiquitous class of group-by queries, which first group data according to one or more attributes, and then aggregate within each group after filtering through ...
Given a budget of sampling M records from a table for a group-by query with r groups, how can one draw a random sample such that the accuracy is maximized? ...
doi:10.1109/icde48307.2020.00053
dblp:conf/icde/NguyenSPXST20
fatcat:l7o7xddxrfe65fenvxpp2ojr7q
Dynamic sample selection for approximate query processing
2003
Proceedings of the 2003 ACM SIGMOD international conference on on Management of data - SIGMOD '03
In this paper, we describe an approximate query processing technique that dynamically constructs an appropriately biased sample for each query by combining samples selected from a family of non-uniform ...
For many aggregation queries, appropriately constructed biased (non-uniform) samples can provide more accurate approximations than a uniform sample. ...
Motivation for Small Group Sampling One of the shortcomings of uniform random sampling for answering group-by queries is that uniform samples give weight to each group in proportion to the number of tuples ...
doi:10.1145/872757.872822
dblp:conf/sigmod/BabcockCD03
fatcat:rru6ogpegfcddftviimzwbtwaq
Dynamic sample selection for approximate query processing
2003
Proceedings of the 2003 ACM SIGMOD international conference on on Management of data - SIGMOD '03
In this paper, we describe an approximate query processing technique that dynamically constructs an appropriately biased sample for each query by combining samples selected from a family of non-uniform ...
For many aggregation queries, appropriately constructed biased (non-uniform) samples can provide more accurate approximations than a uniform sample. ...
Motivation for Small Group Sampling One of the shortcomings of uniform random sampling for answering group-by queries is that uniform samples give weight to each group in proportion to the number of tuples ...
doi:10.1145/872819.872822
fatcat:rfn6rlyeovcqjcbsilmdbu3aea
Congressional samples for approximate answering of group-by queries
2000
Proceedings of the 2000 ACM SIGMOD international conference on Management of data - SIGMOD '00
As a result, approximate answers based on uniform random samples of the data can result in poor accuracy for groups with very few data items, since such groups will be represented in the sample by very ...
In this paper, we propose a general class of techniques for obtaining fast, highly:accurate answers for group-by queries. ...
We also thank him for discussions related to this work. ...
doi:10.1145/342009.335450
dblp:conf/sigmod/AcharyaGP00
fatcat:hzwhlehdyrbubi2t2kbha2vhja
Congressional samples for approximate answering of group-by queries
2000
SIGMOD record
As a result, approximate answers based on uniform random samples of the data can result in poor accuracy for groups with very few data items, since such groups will be represented in the sample by very ...
In this paper, we propose a general class of techniques for obtaining fast, highly:accurate answers for group-by queries. ...
We also thank him for discussions related to this work. ...
doi:10.1145/335191.335450
fatcat:ty2gghwcgzesdh6qd6p6oe3zum
An Efficient Block Sampling Strategy for Online Aggregation in the Cloud
[chapter]
2015
Lecture Notes in Computer Science
In this paper, we propose an efficient block sampling which can exactly reflect the importance of different blocks for answering group-by queries. ...
As a result, answers of online aggregation based on uniform random sampling can result in poor accuracy for groups with very few tuples. ...
[26] proposed a novel sampling scheme for constructing memory-bounded group-aware sample synopses. All above sampling approaches for group-by queries are row-level sampling. ...
doi:10.1007/978-3-319-21042-1_29
fatcat:k7niv3vpqrf4npk7vfxnhrdjiy
Model-based Approximate Query Processing
[article]
2018
arXiv
pre-print
can support ad-hoc exploration queries but yield low quality if executed over rare subpopulations. (2) Classical AQP approaches that rely on offline sampling can use some form of biased sampling to mitigate ...
Furthermore, we think that our techniques of using generative models presented in this paper can not only be used for AQP in databases but also has applications for other database problems including Query ...
The relevance sampling approach is already a major improvement for the approximation of aggregation results compared to random sampling but it ignores the grouping of the SQL query. ...
arXiv:1811.06224v1
fatcat:j6sqkptykbdkpetbdzljbihczu
Secure statistical databases with random sample queries
1980
ACM Transactions on Database Systems
The Random Sample Queries control deals directly with the basic principle of compromise by making it impossible for a questioner to precisely control the lormation of query sets. ...
Queries for frequencies and averages are computed using random samples drawn from the query sets. ...
RAKDOM SAMPLE QUERIES Our proposal for random sampling differs in two important ways from the traditional statistical sampling methods used by the Census Bureau: 1. ...
doi:10.1145/320613.320616
fatcat:gkjbuns4xzdnfa4zvib2dv2ufq
Random sampling from databases: a survey
1995
Statistics and computing
• External use: provide a sample of the result or approximate answer for evaluation purposes -estimate result of aggregate queries
-retrieve a sample of records from a database query for subsequent processing ...
sampling
-alternative name: adaptive sampling
• Group sequential sampling: decide if to continue after groups of sample elements
are obtained
Random Sampling from Databases
Classification of Sampling ...
doi:10.1007/bf00140664
fatcat:qh25azf44vgcvkiwxb7bp3d2oy
Strategy of combining random subspace and diversified active learning in CBIR
2008
2008 15th IEEE International Conference on Image Processing
Using random sampling strategy, we construct a set of random subspaces for learning multiple intrinsic descriptions of image content, with each of which stable component classifier can be trained. ...
To enhance the generalization capability of relevance model, the diversified active learning is carried out by collecting more informative samples, i.e. those samples spreading around decision boundary ...
Figure 2 2 Random grouping for diversified sample labeling
a random partition of image dataset D . ...
doi:10.1109/icip.2008.4712214
dblp:conf/icip/WangZZ08
fatcat:ucxnfn4karfylpeum35vpejbqa
Cost-based Optimization of Complex Scientific Queries
2007
International Conference on Scientific and Statistical Database Management
We improved the optimization by a profiled grouping strategy where the scientific query is first automatically fragmented into subqueries based on application knowledge. ...
We developed a cost model for aggregation operators and other functions used in such queries and show that it substantially improves performance. ...
Then we dynamically generate a profiled group cost model by measuring real executions over data samples. The profiled group cost model is used for join ordering of the groups. ...
doi:10.1109/ssdbm.2007.8
dblp:conf/ssdbm/FomkinR07
fatcat:544k6u6rjjhdlpw5m7n7jj23di
Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line
2016
Nucleic Acids Research
Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. ...
We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rankbased correlation methods. ...
In a two-class context, using reference and query data sets with a common gene list, an overall measure was calculated for each query sample with respect to each reference group. ...
doi:10.1093/nar/gkw578
pmid:27353327
pmcid:PMC5041471
fatcat:vdxzwn2sabhipdmuhdiud2arr4
PF-OLA: A High-Performance Framework for Parallel On-Line Aggregation
[article]
2013
arXiv
pre-print
This allows for the interactive data exploration of the largest datasets. ...
When executed by the framework over a massive 8TB TPC-H instance, the estimator provides accurate confidence bounds early in the execution even when the cardinality of the final result is seven orders ...
We plan to incorporate other estimation methods than sampling in the framework, for example Bayesian statistics and bootstrapping. ...
arXiv:1206.0051v2
fatcat:qo6i47tfjrar5j6u7c6aohnwny
You can stop early with COLA
2012
Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12
We also develop a two-phase stratified sampling method to support multi-table aggregations to improve the approximate query answers and speed up the convergence of confidence intervals. ...
We formulate a statistical foundation that supports block-level sampling for single-table online aggregations and effective estimation of approximate results and confidence intervals of statistical significance ...
In the field of distributed DBMS, the work in [20] compares the accuracy and efficiency of different sampling methods for query size estimation in the parallel DBMS, by using stratified random sampling ...
doi:10.1145/2396761.2398423
dblp:conf/cikm/ShiMWG12
fatcat:guycck6q6vhblptkpr3otweysu
« Previous
Showing results 1 — 15 out of 185,442 results