8,740 Hits in 3.2 sec

Sampling-based sequential subgroup mining

Martin Scholz
2005 Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05  
To this end a novel generic sampling strategy is proposed. It allows to turn pattern mining into an iterative process.  ...  Subgroup discovery is a learning task that aims at finding interesting rules from classified examples.  ...  This sampling technique reads examples sequentially, continuously updating upper bounds for the sample errors, based on the data read so far.  ... 
doi:10.1145/1081870.1081902 dblp:conf/kdd/Scholz05 fatcat:ug2etov4pzbsrmfdqdqlnufkda

Mining Subgroups with Exceptional Transition Behavior

Florian Lemmerich, Martin Becker, Philipp Singer, Denis Helic, Andreas Hotho, Markus Strohmaier
2016 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16  
We present a new method for detecting interpretable subgroups with exceptional transition behavior in sequential data.  ...  The measure compares the distance between the Markov transition matrix of a subgroup and the respective matrix of the entire data with the distance of random dataset samples.  ...  No models featuring sequential data have been explored for exceptional model mining so far.  ... 
doi:10.1145/2939672.2939752 dblp:conf/kdd/Lemmerich0SHHS16 fatcat:lf7v3hg3ojdjjc4n46jwrtwuma

SeqScout: Using a Bandit Model to Discover Interesting Subgroups in Labeled Sequences

Romain Mathonat, Diana Nurbakova, Jean-Francois Boulicaut, Mehdi Kaytoue
2019 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)  
This is a new sampling algorithm that mines discriminant sequential patterns using a multi-armed bandit model.  ...  Though many subgroup discovery algorithms have been proposed for transactional data, discovering subgroups within labeled sequential data and thus searching for descriptions as sequential patterns has  ...  Enumeration-based methods Many enumeration techniques enable to mine patterns from Boolean, sequential and graph data [13] . They can be adapted for the case of discriminative pattern mining.  ... 
doi:10.1109/dsaa.2019.00022 dblp:conf/dsaa/MathonatNBK19 fatcat:ojq4xpuajbhlziyggnuxq4m6zq

Mining microarray data to predict the histological grade of a breast cancer

Mickael Fabregue, Sandra Bringay, Pascal Poncelet, Maguelonne Teisseire, Béatrice Orsetti
2011 Journal of Biomedical Informatics  
Materials and Methods: The method is based on sequential patterns used as features for class prediction. We applied it to classify breast cancer tumors according to their histological grade.  ...  Conclusions: We demonstrated the interest of sequential patterns for class prediction of microarrays and we now have the material to use them for prognostic and predictive applications.  ...  Molecular biomarkers are generated from analyses of DNA microarrays and are based on a particular data mining technique: Sequential pattern discovery.  ... 
doi:10.1016/j.jbi.2011.03.002 pmid:21397039 fatcat:mx3kbx7eqjey3ofdzwvsosauhm

Genetic association with rheumatoid arthritis—Genetic Analysis Workshop 15: summary of contributions from Group 2

Marsha A. Wilcox, Zhong Li, Will Tapper
2007 Genetic Epidemiology  
There were 5,429 SNPs across 22 autosomes in the family-based sample and 113,237 SNPs in the ASP sample.  ...  R620W, but highly significant for the two sequential methods Chromosome 18: 2 sequential methods-weak regional evidence; no association with single SNP method binary subgroup membership in each group  ... 
doi:10.1002/gepi.20276 pmid:18046771 fatcat:xaoaxu6asvfkzbdm7olupwnjza

A Scalable Constant-Memory Sampling Algorithm for Pattern Discovery in Large Databases [chapter]

Tobias Scheffer, Stefan Wrobel
2002 Lecture Notes in Computer Science  
In recent years, significant progress has been achieved in scaling algorithms for this task to very large databases through the use of sequential sampling techniques.  ...  However, except for sampling-based greedy algorithms which cannot give absolute quality guarantees, the scalability of existing approaches to this problem is only with respect to the data, not with respect  ...  The LCM-GSS algorithm overcomes this limitation of sequential sampling and thereby enables mining very large databases with large hypothesis spaces.  ... 
doi:10.1007/3-540-45681-3_33 fatcat:4mbvu2za3jec7l4wm4cqtypmma

A Summarized Report on Data Mining and It's Essential Algorithms

K Murali Gopal, Ranjit Patnaik
2015 International Journal of Engineering Research and  
Data mining is a process of deriving knowledge from such a huge data. In this article a summarized report on the data mining and its essential algorithms are categorized.  ...  On the basis of training data the subgroup attributes are decided and the new item based on its attributes assigned to one of such subgroups.  ...  Clustering is an approach of partitioning a group of elements into more than one subgroup / cluster where elements of each subgroup are similar in characteristics based upon their inter-cluster or intra-cluster  ... 
doi:10.17577/ijertv4is120196 fatcat:cuomryf645dbfay3bw2jsjzo3i

Idiopathic Constipation Can Be Subdivided In Clinical Subtypes: Data Mining By Cluster Analysis On A Population Based Study

Mauro Giacomini, Stefania Bertone, Carlo Mansi, Pietro Dulbecco, Vincenzo Savarino
2008 Zenodo  
Aim: to estimate the prevalence of constipation in a population-based sample and determine whether clinical subgroups can be identified.  ...  Data mining by cluster analysis was used to determine constipation subgroups. Results: 1,500 complete interviews were obtained from 2,083 contacted households (72%).  ...  This method is very rigorous in judging whether the sample can be really sub grouped [18] . D. Statistical Analysis Sample size population was determined by using sequential analysis.  ... 
doi:10.5281/zenodo.1059778 fatcat:bkrcoq4ecnds3lbrucj5u7mpey

Review on Mining High Utility Patterns Decreasing Candidates

Satyajeet Bankar
2016 International Journal Of Engineering And Computer Science  
Utility mining is a latest development of data mining technology.  ...  Among utility mining issues, utility mining with the item set share structure is a hard one as no anti-monotonicity property hold swith the interestingness measure.Prior work son this problem all use a  ...  of subgraph miningapproach for subgroup discovery subgraph.  ... 
doi:10.18535/ijecs/v5i11.27 fatcat:6hix6xkzubhfvk7wbuap2c23ym

Effluent Characterization, Water Quality Monitoring and Sediment Monitoring in the Metal Mining EEM Program

Roy Parker, Charles Dumaresq
2002 Water quality research journal of Canada  
Mines will also be required to collect sediment samples for determination of particle size distribution and total organic carbon.  ...  Samples will be collected four times a year, and will be analyzed for a range of parameters.  ...  Acknowledgments We wish to thank the members of the Water and Sediment Subgroup for their hard work and dedication throughout the consultation process: Vernon Betts (Homestake Canada), Margaret Fairbairn  ... 
doi:10.2166/wqrj.2002.014 fatcat:k2wmcluy4vgbhiknhhlgtjzpme

Computer Programs to Compute Lord's Item Bias Statistic for a Three-Parameter ICC

Ronald G. Downey, Margaret S. Stockdale
1987 Educational and Psychological Measurement  
If corrections are required for any of the four sequential files, pro- grams six through nine are used to provide corrections as deter- mined by the user.  ...  The statistic is a multivariate chi squared, based upon the Hotelling 7° statistic.  ... 
doi:10.1177/001316448704700313 fatcat:6tycftqfe5f5fc4k2pf6f2ap4q

Database Optimization to Recommend Software Developers using Canonical Order Tree [article]

T.M. Amir-Ul-Haque Bhuiyan, Mehedi Hasan Talukdar, Ziaur Rahman, Dr. Mohammad Motiur Rahman
2020 arXiv   pre-print
Recently frequent and sequential pattern mining algorithms have been widely used in the field of software engineering to mine various source code or specification patterns.  ...  In this paper we have proposed a technique based on the Canonical Order Tree that can find out frequent patterns from the incremental database with speedy and efficient way.  ...  In summary, those Apriori based incremental mining algorithms can not be easily adoptable to FP-tree based [13] , [14] incremental mining.  ... 
arXiv:2006.12737v1 fatcat:jgcpxd26vbf47m7rw222zyb5ki

Robust subgroup discovery [article]

Hugo Manuel Proença, Peter Grünwald, Thomas Bäck, Matthijs van Leeuwen
2022 arXiv   pre-print
Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same time from a global modelling perspective  ...  In fact, the greedy gain is shown to be equivalent to a Bayesian one-sample proportion, multinomial, or t-test between the subgroup and dataset marginal target distributions plus a multiple hypothesis  ...  This approach is similar to a sequential approach for mining subgroups.  ... 
arXiv:2103.13686v3 fatcat:oby74jai7nc7hipe3wktvmtwzi

Characterizing user engagement with health app data: a data mining approach

Katrina J. Serrano, Kisha I. Coa, Mandi Yu, Dana L. Wolff-Hughes, Audie A. Atienza
2017 Translational Behavioral Medicine  
Data mining validation methods were conducted with two separate subsamples. On average, users engaged with the app for 29 days.  ...  Six unique subgroups were identified, and engagement for each subgroup varied, ranging from 3.5 to 172 days.  ...  These rapid and sequential exploratory and confirmatory analyses can provide preliminary, empirically-based insights into who uses health apps, thus providing more scientific rigor than anecdotal findings  ... 
doi:10.1007/s13142-017-0508-y pmid:28616846 pmcid:PMC5526821 fatcat:h3wf5o7rp5d7bgo5j3yv36dpdm

Knowledge-Based Sampling for Subgroup Discovery [chapter]

Martin Scholz
2005 Lecture Notes in Computer Science  
To address local pattern mining in this scenario, an extension of subgroup discovery by the knowledge-based sampling approach to iterative model refinement is presented.  ...  Subgroup discovery aims at finding interesting subsets of a classified example set that deviates from the overall distribution.  ...  This sampling technique reads examples sequentially, continuously updating upper bounds for the sample errors, based on the data read so far.  ... 
doi:10.1007/11504245_11 fatcat:wycvcudzung6fof5fxpwvamzqi
« Previous Showing results 1 — 15 out of 8,740 results