Filters








7,286 Hits in 4.6 sec

Approximate Query Answering with Frequent Sets and Maximum Entropy

H. Mannila, P. Smyth
Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073)  
The basic idea is to build a probabilistic model for the data set and answer the queries on the basis of that model. The method consists of the following steps:  ...  We describe an approach to nding approximate answers to Boolean queries on 0 1 data.  ...  Then, we e v aluate ' on the maximum entropy distribution and give the answer as the approximate answer.  ... 
doi:10.1109/icde.2000.839426 dblp:conf/icde/MannilaS00 fatcat:wvafl2cbfzcujlebqv5muutae4

Beyond independence: probabilistic models for query approximation on binary transaction data

D. Pavlov, H. Mannila, P. Smyth
2003 IEEE Transactions on Knowledge and Data Engineering  
We investigate the problem of generating fast approximate answers to queries posed to large sparse binary data sets.  ...  Index Terms Binary transaction data, query approximation, probabilistic model, itemsets, ADTree, maximum entropy.  ...  Acknowledgements The research described in this paper was supported in part by NSF awards IRI-9703120 and IIS-0083489 and by research gifts from IBM Research and Microsoft Research.  ... 
doi:10.1109/tkde.2003.1245281 fatcat:vxb5h6cuoze47orum3zkchgceq

Top-down statistical estimation on a database

Neil C. Rowe
1983 Proceedings of the 1983 ACM SIGMOD international conference on Management of data - SIGMOD '83  
So we have a new and powerful inference rule for answering queries, the "Closed World Rule": if a query is in the guaranteed query set and the answer is not in the database abstract, then the answer found  ...  For instance. if we know the maximum, minimum, mean, and standard deviation of a set, the maximum entropy distribution is of the form ae b( " ) 2 , where the constants a, b, and c can be determined uniquely  ...  hedges to capture fuzziness of query answers. * When sufficientlr, standardized, the database'abstract and rules can be put into hardware. 9 Our set-size rules can estimate selectivities for general query  ... 
doi:10.1145/582192.582217 dblp:conf/sigmod/Rowe83 fatcat:zzezcoy5cjem3ho4mzhvicevxe

Top-down statistical estimation on a database

Neil C. Rowe
1983 Proceedings of the 1983 ACM SIGMOD international conference on Management of data - SIGMOD '83  
So we have a new and powerful inference rule for answering queries, the "Closed World Rule": if a query is in the guaranteed query set and the answer is not in the database abstract, then the answer found  ...  For instance. if we know the maximum, minimum, mean, and standard deviation of a set, the maximum entropy distribution is of the form ae b( " ) 2 , where the constants a, b, and c can be determined uniquely  ...  hedges to capture fuzziness of query answers. * When sufficientlr, standardized, the database'abstract and rules can be put into hardware. 9 Our set-size rules can estimate selectivities for general query  ... 
doi:10.1145/582216.582217 fatcat:elz5ksk7fvbe3ci2uktasconhu

Top-down statistical estimation on a database

Neil C. Rowe
1983 SIGMOD record  
So we have a new and powerful inference rule for answering queries, the "Closed World Rule": if a query is in the guaranteed query set and the answer is not in the database abstract, then the answer found  ...  For instance. if we know the maximum, minimum, mean, and standard deviation of a set, the maximum entropy distribution is of the form ae b( " ) 2 , where the constants a, b, and c can be determined uniquely  ...  hedges to capture fuzziness of query answers. * When sufficientlr, standardized, the database'abstract and rules can be put into hardware. 9 Our set-size rules can estimate selectivities for general query  ... 
doi:10.1145/971695.582217 fatcat:ymfnsavv5nbxvgbdychpfpsz7a

Association Rule Mining using Maximum Entropy [article]

Rasmus Pagh, Morten Stöckel
2015 arXiv   pre-print
Recommendations based on behavioral data may be faced with ambiguous statistical evidence. We consider the case of association rules, relevant e.g. for query and product recommendations.  ...  entropy estimates, and 2) Maximum entropy estimates based on a small number of samples are provably tightly concentrated around the true maximum entropy frequency that arises if we let the number of samples  ...  us to query the the maximum entropy distribution p * for a z-sized set of variables.  ... 
arXiv:1501.02143v1 fatcat:zfbnapgaffdwhahhmd2jgwqnlq

Approximate Query Answering by Model Averaging [chapter]

Dmitry Pavlov, Padhraic Smyth
2003 Proceedings of the 2003 SIAM International Conference on Data Mining  
Learning the combining weights is a straightforward and scalable optimization problem that can be easily automated, providing a practical framework for approximate query answering with massive data sets  ...  In earlier work we have introduced and explored a variety of different probabilistic models for the problem of answering selectivity queries posed to large sparse binary data sets.  ...  mixture models, maximum entropy models, models based on inclusion-exclusion, and Bayesian networks.  ... 
doi:10.1137/1.9781611972733.13 dblp:conf/sdm/PavlovS03 fatcat:6bdcdgmnzzbvhn7wdguo3cgury

Spatio-temporal Range Searching over Compressed Kinetic Sensor Data [chapter]

Sorelle A. Friedler, David M. Mount
2010 Lecture Notes in Computer Science  
We show that with space roughly equal to entropy, queries can be answered in time that is roughly logarithmic in entropy.  ...  As sensor networks increase in size and number, efficient techniques are required to process the very large data sets that they generate.  ...  There exists a data structure for answering ε-approximate spherical range searching queries over a set C of n clumps in R d with preprocessing time O(n log n), query time O((1/ε d−1 )+log n), and space  ... 
doi:10.1007/978-3-642-15775-2_33 fatcat:5dzuyghcivbixcadjhmxv6ixdi

Probabilistic Models for Query Approximation with Large Sparse Binary Datasets [article]

Dmitry Y. Pavlov, Heikki Mannila, Padhraic Smyth
2013 arXiv   pre-print
In particular, we study a Markov random field (MRF) approach based on frequent sets and maximum entropy, and compare it to the independence model and the Chow-Liu tree model.  ...  Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents containing  ...  ., 1999] introduced the idea of using an MRF model based on frequent itemsets and a maximum entropy ( maxent) approach for query ap proximation.  ... 
arXiv:1301.3884v1 fatcat:v6kun4fi6nebxnfyawnve2wx7e

Effective, design-independent XML keyword search

Arash Termehchy, Marainne Winslett
2009 Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09  
Our empirical evaluation with 65 user-supplied queries over two real-world XML data sets shows that CR has better precision and recall and provides better ranking than all previous approaches.  ...  These approaches often retrieve irrelevant answers, overlook relevant answers, and cannot rank answers appropriately.  ...  Exact preprocessing for Xmark data sets with M QL = 5 took too long to run to completion. Approximate preprocessing for Xmark with EV = 3 and M QL = 5 took 128 hours.  ... 
doi:10.1145/1645953.1645970 dblp:conf/cikm/TermehchyW09 fatcat:222kk2v5bbesppumyt7cmxt7qu

Towards classification of DNS erroneous queries

Yuta Kazato, Kensuke Fukuda, Toshiharu Sugawara
2013 Proceedings of the 9th Asian Internet Engineering Conference on - AINTEC '13  
By analyzing erroneous queries leading to NX Domain errors with the proposed heuristic rules to identify the main causes of such errors, we successfully classify them into nine groups that cover approximately  ...  First, we show that ServFail and Refused errors are generated by queries from a small number of local resolvers and authoritative nameservers that do not relate to ordinary users.  ...  These rules cover with approximately 90% of the unique domain names of NX Domain er-rors and provide three plausible main causes.  ... 
doi:10.1145/2534142.2534146 dblp:conf/aintec/KazatoFS13 fatcat:wfkptp4offh4fhgwausevcgppa

Winning with DNS Failures: Strategies for Faster Botnet Detection [chapter]

Sandeep Yadav, A. L. Narasimha Reddy
2012 Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering  
Botnets such as Conficker and Torpig utilize high entropy domains for fluxing and evasion. Bots may query a large number of domains, some of which may fail.  ...  We apply our technique to a Tier-1 ISP dataset obtained from South Asia, and a campus DNS trace, and thus validate our methods by detecting Conficker botnet IPs and other anomalies with a false positive  ...  taken from set with cardinality |D cncip |), and averaged over all such pairs.  ... 
doi:10.1007/978-3-642-31909-9_26 fatcat:upg7jzxtxjhf3ars4chpvqtovu

Evaluation of probabilistic queries over imprecise data in constantly-evolving environments

Reynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar
2007 Information Systems  
More generally, query answers can be augmented with probabilistic guarantees of the validity of the answers. In this paper, we study probabilistic query evaluation based on uncertain data.  ...  A classification of queries is made based upon the nature of the result set.  ...  In [12] , Vrbsky et al. studied approximate answers for set-valued queries (where a query answer contains a set of objects) and single-valued queries (where a query answer contains a single value).  ... 
doi:10.1016/j.is.2005.06.002 fatcat:3ssyxl6wozhmrmwew3ikm32c6y

An Information-Theoretic Privacy Criterion for Query Forgery in Information Retrieval [article]

David Rebollo-Monedero, Javier Parra-Arnau, Jordi Forné
2011 arXiv   pre-print
familiar with information theory and the method of types.  ...  Our criterion measured privacy risk as a divergence between the user's and the population's query distribution, and contemplated the entropy of the user's distribution as a particular case.  ...  answers, and are burdened with a significant computational overhead.  ... 
arXiv:1111.4045v1 fatcat:36wvmgefujbtpjjzdb3kcgg42u

Improving Event Duration Prediction via Time-aware Pre-training [article]

Zonglin Yang, Xinya Du, Alexander Rush, Claire Cardie
2020 arXiv   pre-print
We also demonstrate our models are capable of duration prediction in the unsupervised setting, outperforming the baselines.  ...  Our best model -- E-pred, substantially outperforms previous work, and captures duration information more accurately than R-pred.  ...  Acknowledgments We thank the anonymous reviewers for suggestions and Ben Zhou for running experiment of TACOLM on McTACO-duration dataset.  ... 
arXiv:2011.02610v1 fatcat:5onpgdhxcvhmpbrzqqybf32ula
« Previous Showing results 1 — 15 out of 7,286 results