Filters








185,442 Hits in 5.1 sec

Random Sampling for Group-By Queries [article]

Trong Duc Nguyen, Ming-Hung Shih, Sai Sree Parvathaneni, Bojian Xu, Divesh Srivastava, Srikanta Tirthapura
2019 arXiv   pre-print
We present CVOPT, a query- and data-driven sampling framework for a set of group-by queries.  ...  We consider random sampling for answering the ubiquitous class of group-by queries, which first group data according to one or more attributes, and then aggregate within each group after filtering through  ...  Given a budget of sampling M records from a table for a group-by query with r groups, how can one draw a random sample such that the accuracy is maximized?  ... 
arXiv:1909.02629v3 fatcat:ypwxnvhe7rgo3n4q7uwvippqbm

Random Sampling for Group-By Queries

Trong Duc Nguyen, Ming-Hung Shih, Sai Sree Parvathaneni, Bojian Xu, Divesh Srivastava, Srikanta Tirthapura
2020 2020 IEEE 36th International Conference on Data Engineering (ICDE)  
We present CVOPT, a query-and data-driven sampling framework for a set of group-by queries.  ...  We consider random sampling for answering the ubiquitous class of group-by queries, which first group data according to one or more attributes, and then aggregate within each group after filtering through  ...  Given a budget of sampling M records from a table for a group-by query with r groups, how can one draw a random sample such that the accuracy is maximized?  ... 
doi:10.1109/icde48307.2020.00053 dblp:conf/icde/NguyenSPXST20 fatcat:l7o7xddxrfe65fenvxpp2ojr7q

Dynamic sample selection for approximate query processing

Brian Babcock, Surajit Chaudhuri, Gautam Das
2003 Proceedings of the 2003 ACM SIGMOD international conference on on Management of data - SIGMOD '03  
In this paper, we describe an approximate query processing technique that dynamically constructs an appropriately biased sample for each query by combining samples selected from a family of non-uniform  ...  For many aggregation queries, appropriately constructed biased (non-uniform) samples can provide more accurate approximations than a uniform sample.  ...  Motivation for Small Group Sampling One of the shortcomings of uniform random sampling for answering group-by queries is that uniform samples give weight to each group in proportion to the number of tuples  ... 
doi:10.1145/872757.872822 dblp:conf/sigmod/BabcockCD03 fatcat:rru6ogpegfcddftviimzwbtwaq

Dynamic sample selection for approximate query processing

Brian Babcock, Surajit Chaudhuri, Gautam Das
2003 Proceedings of the 2003 ACM SIGMOD international conference on on Management of data - SIGMOD '03  
In this paper, we describe an approximate query processing technique that dynamically constructs an appropriately biased sample for each query by combining samples selected from a family of non-uniform  ...  For many aggregation queries, appropriately constructed biased (non-uniform) samples can provide more accurate approximations than a uniform sample.  ...  Motivation for Small Group Sampling One of the shortcomings of uniform random sampling for answering group-by queries is that uniform samples give weight to each group in proportion to the number of tuples  ... 
doi:10.1145/872819.872822 fatcat:rfn6rlyeovcqjcbsilmdbu3aea

Congressional samples for approximate answering of group-by queries

Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala
2000 Proceedings of the 2000 ACM SIGMOD international conference on Management of data - SIGMOD '00  
As a result, approximate answers based on uniform random samples of the data can result in poor accuracy for groups with very few data items, since such groups will be represented in the sample by very  ...  In this paper, we propose a general class of techniques for obtaining fast, highly:accurate answers for group-by queries.  ...  We also thank him for discussions related to this work.  ... 
doi:10.1145/342009.335450 dblp:conf/sigmod/AcharyaGP00 fatcat:hzwhlehdyrbubi2t2kbha2vhja

Congressional samples for approximate answering of group-by queries

Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala
2000 SIGMOD record  
As a result, approximate answers based on uniform random samples of the data can result in poor accuracy for groups with very few data items, since such groups will be represented in the sample by very  ...  In this paper, we propose a general class of techniques for obtaining fast, highly:accurate answers for group-by queries.  ...  We also thank him for discussions related to this work.  ... 
doi:10.1145/335191.335450 fatcat:ty2gghwcgzesdh6qd6p6oe3zum

An Efficient Block Sampling Strategy for Online Aggregation in the Cloud [chapter]

Xiang Ci, Xiaofeng Meng
2015 Lecture Notes in Computer Science  
In this paper, we propose an efficient block sampling which can exactly reflect the importance of different blocks for answering group-by queries.  ...  As a result, answers of online aggregation based on uniform random sampling can result in poor accuracy for groups with very few tuples.  ...  [26] proposed a novel sampling scheme for constructing memory-bounded group-aware sample synopses. All above sampling approaches for group-by queries are row-level sampling.  ... 
doi:10.1007/978-3-319-21042-1_29 fatcat:k7niv3vpqrf4npk7vfxnhrdjiy

Model-based Approximate Query Processing [article]

Moritz Kulessa, Alejandro Molina, Carsten Binnig, Benjamin Hilprecht, Kristian Kersting
2018 arXiv   pre-print
can support ad-hoc exploration queries but yield low quality if executed over rare subpopulations. (2) Classical AQP approaches that rely on offline sampling can use some form of biased sampling to mitigate  ...  Furthermore, we think that our techniques of using generative models presented in this paper can not only be used for AQP in databases but also has applications for other database problems including Query  ...  The relevance sampling approach is already a major improvement for the approximation of aggregation results compared to random sampling but it ignores the grouping of the SQL query.  ... 
arXiv:1811.06224v1 fatcat:j6sqkptykbdkpetbdzljbihczu

Secure statistical databases with random sample queries

Dorothy E. Denning
1980 ACM Transactions on Database Systems  
The Random Sample Queries control deals directly with the basic principle of compromise by making it impossible for a questioner to precisely control the lormation of query sets.  ...  Queries for frequencies and averages are computed using random samples drawn from the query sets.  ...  RAKDOM SAMPLE QUERIES Our proposal for random sampling differs in two important ways from the traditional statistical sampling methods used by the Census Bureau: 1.  ... 
doi:10.1145/320613.320616 fatcat:gkjbuns4xzdnfa4zvib2dv2ufq

Random sampling from databases: a survey

Frank Olken, Doron Rotem
1995 Statistics and computing  
• External use: provide a sample of the result or approximate answer for evaluation purposes -estimate result of aggregate queries -retrieve a sample of records from a database query for subsequent processing  ...  sampling -alternative name: adaptive samplingGroup sequential sampling: decide if to continue after groups of sample elements are obtained Random Sampling from Databases Classification of Sampling  ... 
doi:10.1007/bf00140664 fatcat:qh25azf44vgcvkiwxb7bp3d2oy

Strategy of combining random subspace and diversified active learning in CBIR

Fang Wang, Zhenfeng Zhu, Yao Zhao
2008 2008 15th IEEE International Conference on Image Processing  
Using random sampling strategy, we construct a set of random subspaces for learning multiple intrinsic descriptions of image content, with each of which stable component classifier can be trained.  ...  To enhance the generalization capability of relevance model, the diversified active learning is carried out by collecting more informative samples, i.e. those samples spreading around decision boundary  ...  Figure 2 2 Random grouping for diversified sample labeling a random partition of image dataset D .  ... 
doi:10.1109/icip.2008.4712214 dblp:conf/icip/WangZZ08 fatcat:ucxnfn4karfylpeum35vpejbqa

Cost-based Optimization of Complex Scientific Queries

Ruslan Fomkin, Tore Risch
2007 International Conference on Scientific and Statistical Database Management  
We improved the optimization by a profiled grouping strategy where the scientific query is first automatically fragmented into subqueries based on application knowledge.  ...  We developed a cost model for aggregation operators and other functions used in such queries and show that it substantially improves performance.  ...  Then we dynamically generate a profiled group cost model by measuring real executions over data samples. The profiled group cost model is used for join ordering of the groups.  ... 
doi:10.1109/ssdbm.2007.8 dblp:conf/ssdbm/FomkinR07 fatcat:544k6u6rjjhdlpw5m7n7jj23di

Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line

Jaine K. Blayney, Timothy Davison, Nuala McCabe, Steven Walker, Karen Keating, Thomas Delaney, Caroline Greenan, Alistair R. Williams, W. Glenn McCluggage, Amanda Capes-Davis, D. Paul Harkin, Charlie Gourley (+1 others)
2016 Nucleic Acids Research  
Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues.  ...  We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rankbased correlation methods.  ...  In a two-class context, using reference and query data sets with a common gene list, an overall measure was calculated for each query sample with respect to each reference group.  ... 
doi:10.1093/nar/gkw578 pmid:27353327 pmcid:PMC5041471 fatcat:vdxzwn2sabhipdmuhdiud2arr4

PF-OLA: A High-Performance Framework for Parallel On-Line Aggregation [article]

Chengjie Qin, Florin Rusu
2013 arXiv   pre-print
This allows for the interactive data exploration of the largest datasets.  ...  When executed by the framework over a massive 8TB TPC-H instance, the estimator provides accurate confidence bounds early in the execution even when the cardinality of the final result is seven orders  ...  We plan to incorporate other estimation methods than sampling in the framework, for example Bayesian statistics and bootstrapping.  ... 
arXiv:1206.0051v2 fatcat:qo6i47tfjrar5j6u7c6aohnwny

You can stop early with COLA

Yingjie Shi, Xiaofeng Meng, Fusheng Wang, Yantao Gan
2012 Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12  
We also develop a two-phase stratified sampling method to support multi-table aggregations to improve the approximate query answers and speed up the convergence of confidence intervals.  ...  We formulate a statistical foundation that supports block-level sampling for single-table online aggregations and effective estimation of approximate results and confidence intervals of statistical significance  ...  In the field of distributed DBMS, the work in [20] compares the accuracy and efficiency of different sampling methods for query size estimation in the parallel DBMS, by using stratified random sampling  ... 
doi:10.1145/2396761.2398423 dblp:conf/cikm/ShiMWG12 fatcat:guycck6q6vhblptkpr3otweysu
« Previous Showing results 1 — 15 out of 185,442 results