A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets
[article]
2010
arXiv
pre-print
In this work, we address significance in the context of frequent itemset mining. ...
Specifically, we develop a novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from ...
In this paper, we develop a rigorous and efficient novel approach for identifying frequent itemsets featuring both a global and a pointwise guarantee on their statistical significance. ...
arXiv:1002.1104v1
fatcat:gyapkwbrfbebpl4t5wu4p7gwwq
An efficient rigorous approach for identifying statistically significant frequent itemsets
2009
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '09
Based on this approximation, we develop an efficient parametric multi-hypothesis test for identifying the desired threshold s * . ...
In this work, we address significance in the context of frequent itemset mining. ...
In this paper, we develop a rigorous and efficient novel approach for identifying statistically significant frequent itemsets. ...
doi:10.1145/1559795.1559814
dblp:conf/pods/KirschMPPUV09
fatcat:6bh22euxzzb27jrdktffzq55nu
An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets
2012
Journal of the ACM
Based on this approximation, we develop an efficient parametric multi-hypothesis test for identifying the desired threshold s * . ...
In this work, we address significance in the context of frequent itemset mining. ...
In this paper, we develop a rigorous and efficient novel approach for identifying statistically significant frequent itemsets. ...
doi:10.1145/2220357.2220359
fatcat:5fisvpihbrfrtk24b3ueso5mju
ESCORT (Enterprise Services Cross-sell Optimization Using Rigorous Tests of Association)
2017
Advances in Economics and Business
With advent of statistic many organizations in retail industry started using analytical methods to identify cross sell opportunities. ...
We strongly believe that this approach could be very useful for a lot of B2B organizations in the services industry with multiple offerings and limited budget to pursue all possible cross sell opportunities ...
The bidirectional approach requires more space to store the candidate itemset, but it can help to rapidly identify the frequent itemset border, given the configuration as shown below (Figure 3 ). ...
doi:10.13189/aeb.2017.050501
fatcat:lcx2ptfplbbrpanh5pmsrn3mui
An effective scheme for top-k frequent itemset mining under differential privacy conditions
2020
Science China Information Sciences
An effective scheme for top-k frequent itemset mining under differential privacy conditions. ...
The protection of user privacy while obtaining statistical information is important. Differential privacy (DP) is a strong and rigorous standard for privacy protection. ...
Differential privacy (DP) is a strong and rigorous standard for privacy protection. In this study, we focused on effectively discovering top-k frequent itemsets under DP conditions. ...
doi:10.1007/s11432-018-9849-y
fatcat:byhvt7vr3ff23nst2zrwbewvde
Fast Approach for High Temporal Utility Item Mining
2016
International Journal of Database Theory and Application
Moreover, FHUI-Growth, a fast approach for high utility itemset mining algorithm is developed for mining high utility itemsets. ...
A novel High Utility Itemset tree (HUI-tree) structure, which is an extended prefix-tree structure for the storage of compressed utility information about itemsets, is proposed to address this issue. ...
This method is an efficient way of storing frequent itemsets using minimal memory without losing utility information, especially for a dense dataset. ...
doi:10.14257/ijdta.2016.9.8.22
fatcat:qkabu5fkbbbjlkcnpqy3j2zqre
Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing
[article]
2015
arXiv
pre-print
The key to this efficiency is that unlike all existing methods, our algorithm neither needs to solve the underlying frequent itemset mining problem anew for each permutation nor needs to store the occurrence ...
We present a novel algorithm, Westfall-Young light, for detecting patterns, such as itemsets and subgraphs, which are statistically significantly enriched in one of two classes. ...
The results for significant itemset mining are summarized in Figure 2 . ...
arXiv:1502.04315v1
fatcat:tch52dvsc5azraf7owjkxa5xxy
identified a new zero-day vulnerability and its variant in its top results, as well as revealed many new anti-virus signatures. ...
Malware writers are constantly looking for new vulnerabilities to exploit in popular software applications. ...
These properties give us a statistical significance test for itemsets in the data. ...
doi:10.1145/2046684.2046690
dblp:conf/ccs/KaranthLNVLS11
fatcat:mlcae5tcanen5nuayx33rkcsqi
Genetic programming and frequent itemset mining to identify feature selection patterns of iEEG and fMRI epilepsy data
2015
Engineering applications of artificial intelligence
We examined the reproducibility of implicitly selecting features to classify interictal activity using a GP algorithm by performing several selection trials and subsequent frequent itemset mining (FIM) ...
with a genetic programming (GP) algorithm more effectively determined the proper features to discern biomarker and non-biomarker interictal iEEG and fMRI activity than conventional feature selection approaches ...
in Rochester, Minnesota for providing the separate raw iEEG signals and raw fMRI signals for the signal processing analysis in this work by Drs. ...
doi:10.1016/j.engappai.2014.12.008
pmid:25580059
pmcid:PMC4285716
fatcat:ilwjladvfrfadmw3zg2sqhaa3i
Mining Sequential Patterns with VC-Dimension and Rademacher Complexity
2020
Algorithms
We also present the first algorithms to mine approximations of the true frequent sequential patterns with rigorous guarantees on the quality of the output. ...
We present the first sampling-based algorithm to mine, with high confidence, a rigorous approximation of the frequent sequential patterns from massive datasets. ...
Algorithm 7: Mining the True Frequent Sequential Patterns. ...
doi:10.3390/a13050123
fatcat:yfajokt5oveerg2lj5h5foopre
Efficient predicated bug signature mining via hierarchical instrumentation
2014
Proceedings of the 2014 International Symposium on Software Testing and Analysis - ISSTA 2014
An essential and yet expensive process in debugging is bug isolation. As one major family of automatic bug isolation, statistical bug isolation approaches have been well studied in the past decade. ...
We employ HI technique to predicated bug signature mining (called MPS) recently developed and propose an approach called HIMPS. ...
Our appreciation also goes to Ben Liblit at University of Wisconsin-Madison for making their predicate-based instrumentor available. This work is supported by an NUS Research Grant R-252-000-484-112. ...
doi:10.1145/2610384.2610400
dblp:conf/issta/ZuoKS14
fatcat:gwj7iwus6rhejbzj37ny3mdai4
A Universal Toolkit for Cryptographically Secure Privacy-Preserving Data Mining
[chapter]
2012
Lecture Notes in Computer Science
To validate the practical feasibility of our approach, we implemented and benchmarked four algorithms for frequent itemset mining. ...
Furthermore, there are no established tools for applying secure multi-party computation in real-world applications. ...
Therefore, a data mining expert does not have to be a cryptography expert to use SHAREMIND and SECREC for creating privacy-preserving data mining applications. ...
doi:10.1007/978-3-642-30428-6_9
fatcat:xnatn2zo4ffythvc3gwmw4aype
A framework for measuring changes in data characteristics
1999
Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '99
Our framework covers a wide variety of models including frequent itemsets, decision tree classifiers, and clusters, and captures standard measures of deviation such as the misclassification rate and the ...
significant differences in their characteristics), and discuss several practical applications. ...
The measure of a region identified by an itemset is the support of the itemset. ...
doi:10.1145/303976.303989
dblp:conf/pods/GantiGR99
fatcat:oxma54o4tbfxlfamsgrri4jvza
Scalable knowledge harvesting with high precision and high recall
2011
Proceedings of the fourth ACM international conference on Web search and data mining - WSDM '11
We compute pattern-occurrence statistics for two benefits: they serve to prune the hypotheses space and to derive informative weights of clauses for the reasoner. ...
We propose a new notion of ngram-itemsets for richer patterns, and use MaxSat-based constraint reasoning on both the quality of patterns and the validity of fact candidates. ...
Acknowledgements We are grateful to the European Union and to Google for supporting parts of this research, through the EU project Living Knowledge and a Google Research Award, respectively. ...
doi:10.1145/1935826.1935869
dblp:conf/wsdm/NakasholeTW11
fatcat:gvqjuvvyfzelnk52mhrflp4j7m
Discovering frequent patterns in sensitive data
2010
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '10
We present two efficient algorithms for discovering the K most frequent patterns in a data set of sensitive records. ...
This paper shows how one can accurately discover and release the most significant patterns along with their frequencies in a data set containing sensitive information, while providing rigorous guarantees ...
We thank Daniel Kifer for helpful comments. ...
doi:10.1145/1835804.1835869
dblp:conf/kdd/BhaskarLST10
fatcat:7ug6rqk3dnerzkcmacnj73xw6e
« Previous
Showing results 1 — 15 out of 306 results