Filters








306 Hits in 6.5 sec

An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets [article]

Adam Kirsch, Michael Mitzenmacher, Andrea Pietracaprina, Geppino Pucci, Eli Upfal, Fabio Vandin
2010 arXiv   pre-print
In this work, we address significance in the context of frequent itemset mining.  ...  Specifically, we develop a novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from  ...  In this paper, we develop a rigorous and efficient novel approach for identifying frequent itemsets featuring both a global and a pointwise guarantee on their statistical significance.  ... 
arXiv:1002.1104v1 fatcat:gyapkwbrfbebpl4t5wu4p7gwwq

An efficient rigorous approach for identifying statistically significant frequent itemsets

Adam Kirsch, Michael Mitzenmacher, Andrea Pietracaprina, Geppino Pucci, Eli Upfal, Fabio Vandin
2009 Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '09  
Based on this approximation, we develop an efficient parametric multi-hypothesis test for identifying the desired threshold s * .  ...  In this work, we address significance in the context of frequent itemset mining.  ...  In this paper, we develop a rigorous and efficient novel approach for identifying statistically significant frequent itemsets.  ... 
doi:10.1145/1559795.1559814 dblp:conf/pods/KirschMPPUV09 fatcat:6bh22euxzzb27jrdktffzq55nu

An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

Adam Kirsch, Michael Mitzenmacher, Andrea Pietracaprina, Geppino Pucci, Eli Upfal, Fabio Vandin
2012 Journal of the ACM  
Based on this approximation, we develop an efficient parametric multi-hypothesis test for identifying the desired threshold s * .  ...  In this work, we address significance in the context of frequent itemset mining.  ...  In this paper, we develop a rigorous and efficient novel approach for identifying statistically significant frequent itemsets.  ... 
doi:10.1145/2220357.2220359 fatcat:5fisvpihbrfrtk24b3ueso5mju

ESCORT (Enterprise Services Cross-sell Optimization Using Rigorous Tests of Association)

Nishant Saxena
2017 Advances in Economics and Business  
With advent of statistic many organizations in retail industry started using analytical methods to identify cross sell opportunities.  ...  We strongly believe that this approach could be very useful for a lot of B2B organizations in the services industry with multiple offerings and limited budget to pursue all possible cross sell opportunities  ...  The bidirectional approach requires more space to store the candidate itemset, but it can help to rapidly identify the frequent itemset border, given the configuration as shown below (Figure 3 ).  ... 
doi:10.13189/aeb.2017.050501 fatcat:lcx2ptfplbbrpanh5pmsrn3mui

An effective scheme for top-k frequent itemset mining under differential privacy conditions

Wenjuan Liang, Hong Chen, Jing Zhang, Dan Zhao, Cuiping Li
2020 Science China Information Sciences  
An effective scheme for top-k frequent itemset mining under differential privacy conditions.  ...  The protection of user privacy while obtaining statistical information is important. Differential privacy (DP) is a strong and rigorous standard for privacy protection.  ...  Differential privacy (DP) is a strong and rigorous standard for privacy protection. In this study, we focused on effectively discovering top-k frequent itemsets under DP conditions.  ... 
doi:10.1007/s11432-018-9849-y fatcat:byhvt7vr3ff23nst2zrwbewvde

Fast Approach for High Temporal Utility Item Mining

Pan Yi, Liu Huafu, Zhang Bo
2016 International Journal of Database Theory and Application  
Moreover, FHUI-Growth, a fast approach for high utility itemset mining algorithm is developed for mining high utility itemsets.  ...  A novel High Utility Itemset tree (HUI-tree) structure, which is an extended prefix-tree structure for the storage of compressed utility information about itemsets, is proposed to address this issue.  ...  This method is an efficient way of storing frequent itemsets using minimal memory without losing utility information, especially for a dense dataset.  ... 
doi:10.14257/ijdta.2016.9.8.22 fatcat:qkabu5fkbbbjlkcnpqy3j2zqre

Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing [article]

Felipe Llinares López, Mahito Sugiyama, Laetitia Papaxanthos, Karsten M. Borgwardt
2015 arXiv   pre-print
The key to this efficiency is that unlike all existing methods, our algorithm neither needs to solve the underlying frequent itemset mining problem anew for each permutation nor needs to store the occurrence  ...  We present a novel algorithm, Westfall-Young light, for detecting patterns, such as itemsets and subgraphs, which are statistically significantly enriched in one of two classes.  ...  The results for significant itemset mining are summarized in Figure 2 .  ... 
arXiv:1502.04315v1 fatcat:tch52dvsc5azraf7owjkxa5xxy

ZDVUE

Sandeep Karanth, Srivatsan Laxman, Prasad Naldurg, Ramarathnam Venkatesan, J. Lambert, Jinwook Shin
2011 Proceedings of the 4th ACM workshop on Security and artificial intelligence - AISec '11  
identified a new zero-day vulnerability and its variant in its top results, as well as revealed many new anti-virus signatures.  ...  Malware writers are constantly looking for new vulnerabilities to exploit in popular software applications.  ...  These properties give us a statistical significance test for itemsets in the data.  ... 
doi:10.1145/2046684.2046690 dblp:conf/ccs/KaranthLNVLS11 fatcat:mlcae5tcanen5nuayx33rkcsqi

Genetic programming and frequent itemset mining to identify feature selection patterns of iEEG and fMRI epilepsy data

Otis Smart, Lauren Burrell
2015 Engineering applications of artificial intelligence  
We examined the reproducibility of implicitly selecting features to classify interictal activity using a GP algorithm by performing several selection trials and subsequent frequent itemset mining (FIM)  ...  with a genetic programming (GP) algorithm more effectively determined the proper features to discern biomarker and non-biomarker interictal iEEG and fMRI activity than conventional feature selection approaches  ...  in Rochester, Minnesota for providing the separate raw iEEG signals and raw fMRI signals for the signal processing analysis in this work by Drs.  ... 
doi:10.1016/j.engappai.2014.12.008 pmid:25580059 pmcid:PMC4285716 fatcat:ilwjladvfrfadmw3zg2sqhaa3i

Mining Sequential Patterns with VC-Dimension and Rademacher Complexity

Diego Santoro, Andrea Tonon, Fabio Vandin
2020 Algorithms  
We also present the first algorithms to mine approximations of the true frequent sequential patterns with rigorous guarantees on the quality of the output.  ...  We present the first sampling-based algorithm to mine, with high confidence, a rigorous approximation of the frequent sequential patterns from massive datasets.  ...  Algorithm 7: Mining the True Frequent Sequential Patterns.  ... 
doi:10.3390/a13050123 fatcat:yfajokt5oveerg2lj5h5foopre

Efficient predicated bug signature mining via hierarchical instrumentation

Zhiqiang Zuo, Siau-Cheng Khoo, Chengnian Sun
2014 Proceedings of the 2014 International Symposium on Software Testing and Analysis - ISSTA 2014  
An essential and yet expensive process in debugging is bug isolation. As one major family of automatic bug isolation, statistical bug isolation approaches have been well studied in the past decade.  ...  We employ HI technique to predicated bug signature mining (called MPS) recently developed and propose an approach called HIMPS.  ...  Our appreciation also goes to Ben Liblit at University of Wisconsin-Madison for making their predicate-based instrumentor available. This work is supported by an NUS Research Grant R-252-000-484-112.  ... 
doi:10.1145/2610384.2610400 dblp:conf/issta/ZuoKS14 fatcat:gwj7iwus6rhejbzj37ny3mdai4

A Universal Toolkit for Cryptographically Secure Privacy-Preserving Data Mining [chapter]

Dan Bogdanov, Roman Jagomägis, Sven Laur
2012 Lecture Notes in Computer Science  
To validate the practical feasibility of our approach, we implemented and benchmarked four algorithms for frequent itemset mining.  ...  Furthermore, there are no established tools for applying secure multi-party computation in real-world applications.  ...  Therefore, a data mining expert does not have to be a cryptography expert to use SHAREMIND and SECREC for creating privacy-preserving data mining applications.  ... 
doi:10.1007/978-3-642-30428-6_9 fatcat:xnatn2zo4ffythvc3gwmw4aype

A framework for measuring changes in data characteristics

Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan
1999 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '99  
Our framework covers a wide variety of models including frequent itemsets, decision tree classifiers, and clusters, and captures standard measures of deviation such as the misclassification rate and the  ...  significant differences in their characteristics), and discuss several practical applications.  ...  The measure of a region identified by an itemset is the support of the itemset.  ... 
doi:10.1145/303976.303989 dblp:conf/pods/GantiGR99 fatcat:oxma54o4tbfxlfamsgrri4jvza

Scalable knowledge harvesting with high precision and high recall

Ndapandula Nakashole, Martin Theobald, Gerhard Weikum
2011 Proceedings of the fourth ACM international conference on Web search and data mining - WSDM '11  
We compute pattern-occurrence statistics for two benefits: they serve to prune the hypotheses space and to derive informative weights of clauses for the reasoner.  ...  We propose a new notion of ngram-itemsets for richer patterns, and use MaxSat-based constraint reasoning on both the quality of patterns and the validity of fact candidates.  ...  Acknowledgements We are grateful to the European Union and to Google for supporting parts of this research, through the EU project Living Knowledge and a Google Research Award, respectively.  ... 
doi:10.1145/1935826.1935869 dblp:conf/wsdm/NakasholeTW11 fatcat:gvqjuvvyfzelnk52mhrflp4j7m

Discovering frequent patterns in sensitive data

Raghav Bhaskar, Srivatsan Laxman, Adam Smith, Abhradeep Thakurta
2010 Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '10  
We present two efficient algorithms for discovering the K most frequent patterns in a data set of sensitive records.  ...  This paper shows how one can accurately discover and release the most significant patterns along with their frequencies in a data set containing sensitive information, while providing rigorous guarantees  ...  We thank Daniel Kifer for helpful comments.  ... 
doi:10.1145/1835804.1835869 dblp:conf/kdd/BhaskarLST10 fatcat:7ug6rqk3dnerzkcmacnj73xw6e
« Previous Showing results 1 — 15 out of 306 results