Filters








465,240 Hits in 3.4 sec

Discovering data quality rules

Fei Chiang, Renée J. Miller
2008 Proceedings of the VLDB Endowment  
Data quality rules are known to be contextual, so we focus on the discovery of context-dependent rules.  ...  Our discovery algorithm searches for minimal CFDs among the data values and prunes redundant candidates. No universal objective measures of data quality or data quality rules are known.  ...  We thank Tasos Kementsietsidis and Xibei Jia for providing us with the tax data generator.  ... 
doi:10.14778/1453856.1453980 fatcat:kqsmykm3nffxzbo4x224cfc3bi

Discovering dynamic integrity rules with a rules-based tool for data quality analyzing

Thanh Thoa Pham Thi, Markus Helfert
2010 Proceedings of the 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing on International Conference on Computer Systems and Technologies - CompSysTech '10  
Rules based approaches for data quality solutions often use business rules or integrity rules for data monitoring purpose.  ...  In this paper, we present our rule-based approach for data quality analyzing, in which we discuss a comprehensive method for discovering dynamic integrity rules.  ...  INTRODUCTION Data quality (DQ) is an increasing concern for most businesses. High quality data helps the organisations to save costs, to make better decisions and to improve customer service.  ... 
doi:10.1145/1839379.1839396 dblp:conf/compsystech/ThiH10 fatcat:7gdwhojprbenhcjhg6o24kyzie

Data quality: The other face of Big Data

Barna Saha, Divesh Srivastava
2014 2014 IEEE 30th International Conference on Data Engineering  
With the variety of data, often from a diversity of sources, data quality rules cannot be specified a priori; one needs to let the "data to speak for itself" in order to discover the semantics of the data  ...  This tutorial presents recent results that are relevant to big data quality management, focusing on the two major dimensions of (i) discovering quality issues from the data itself and (ii) trading-off  ...  Since rules are discovered based on dirty data, inconsistencies may appear as an effect of faulty rules.  ... 
doi:10.1109/icde.2014.6816764 dblp:conf/icde/SahaS14 fatcat:eomux6d3vbaedflg2fhfhe7kpa

GPR: A Data Mining Tool Using Genetic Programming

Balasubramaniam Ramesh
2001 Communications of the Association for Information Systems  
We present GPR, an inductive data-mining system we developed. GPR uses the technique of genetic programming to discover rules.  ...  In the following section, we briefly define terminology and concepts related to knowledge discovery and the reasons for our focus on discovering production rules.  ...  Certainty discovered a very large number of exact rules, including some that applied to only three or four Sample Rules Produced by Three Knowledge Quality Functions GPR: A Data Mining Tool Using Genetic  ... 
doi:10.17705/1cais.00506 fatcat:xbklzxyinjgtfj7ifchudm4j74

Comparative Analysis of Variations of Ant-Miner by Varying Input Parameters

Sonal P.Rami, Mahesh H. Panchal
2012 International Journal of Computer Applications  
ACO can be applied to the data mining field to extract rule-based classifiers.  ...  Three algorithms (Ant-Miner, Ant-Tree-Miner and cAnt-Miner) are compared against input parameters with respect to predictive accuracy and simplicity of the discovered rules.  ...  Extend Quality Measures for classification 4. New Multi-class rule Quality measures 5. Modification for Multi-Label classification 6. Discovering fuzzy classification rules 7.  ... 
doi:10.5120/9673-4097 fatcat:upcbisdwhfhzlo5kqxztdjyiu4

A new version of the ant-miner algorithm discovering unordered rule sets

James Smaldon, Alex A. Freitas
2006 Proceedings of the 8th annual conference on Genetic and evolutionary computation - GECCO '06  
Hence, the proposed version facilitates the interpretation of discovered knowledge, an important point in data mining.  ...  The Ant-Miner algorithm, first proposed by Parpinelli and colleagues, applies an ant colony optimization heuristic to the classification task of data mining to discover an ordered list of classification  ...  The Ljubljana breast cancer data set was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.  ... 
doi:10.1145/1143997.1144004 dblp:conf/gecco/SmaldonF06 fatcat:4okqumfgfjabfjekq5oc2gorh4

Methodology Design for Data Preparation in the Process of Discovering Patterns of Web Users Behaviour

Michal Munk, Martin Drl�k, Jozef Kapusta, Daša Munkov�
2013 Applied Mathematics & Information Sciences  
Data preparation represents the first inevitable step in the process of discovering users' behavioural patterns.  ...  Considering the obtained results we propose a methodology for data preparation in the process of discovering patterns of web user behaviour based on the results of experiments we carried out.  ...  and on the quality in terms of the basic quality characteristics of discovered rules.  ... 
doi:10.12785/amis/071l05 fatcat:dreogqdww5aqlgs4zmwrwmbchi

User Identification in the Process of Web Usage Data Preprocessing

Jozef Kapusta, Michal Munk, Dominik Halvoník, Martin Drlík
2019 International Journal of Emerging Technologies in Learning (iJET)  
This comparison was performed concerning the quality of the se-quential rules generated, i.e., a comparison was made regarding generation useful, trivial and inexplicable rules.  ...  There are multiple places where we can extract the necessary data.  ...  Differences in the results of sequence rule analysis are not only in the quantity of discovered rules, but also in the quality (the value of support variable) of discovered rules in examined files.  ... 
doi:10.3991/ijet.v14i09.9854 fatcat:nvu2b5e74vf3nbl7ggxv3qh4ie

SEWEBAR-CMS: A System for Postprocessing Data Mining Models

Tomás Kliegr, David Chudán, Andrej Hazucha, Jan Rauch
2010 International Web Rule Symposium  
The principal problem of the association rule (AR) mining task is the selection of rules that might be interesting for the domain expert from the many rules typically generated by the software.  ...  -based Content Management System for post-processing AR models that supports the data analyst in this effort.  ...  discovered association rules.  ... 
dblp:conf/ruleml/KliegrCHR10 fatcat:5wfqkqdibfcqref7dmhebnxbke

Evolutionary Mining for Image Classification Rules [chapter]

Jerzy Korczak, Arnaud Quirin
2004 Lecture Notes in Computer Science  
Classification rules, discovered by application of a genetic algorithm on remote sensing data, are able to identify spectral classes with comparable accuracy to that of a human expert.  ...  In our case studies, the hyperspectral images contain voluminous, complex and frequently noisy data.  ...  In this paper, a new data-driven approach is proposed in order to discover classification rules using the paradigm of genetic evolution.  ... 
doi:10.1007/978-3-540-24621-3_13 fatcat:bm22gin7c5erdpcuirocsulzu4

Enhanced cAntMinerPB Algorithm for Induction of Classification Rules using Ant Colony Approach

Safeya Rajpiplawala, Dheeraj Kumar Singh
2014 IOSR Journal of Computer Engineering  
Rule induction is a method used in data mining where the desired output is a set of Rules or Statements that characterize the data.  ...  Mining classification rules from data is a key mission of data mining and is getting great attention in recent years.  ...  rule 16: add the rule in discovered list of rules 17: end while 18: if compare quality of discovered list of rules then 19: update list according to highest quality 20: end if 21: end for 22  ... 
doi:10.9790/0661-16326372 fatcat:g7hs3opkkrc3dofra6j6anpyde

Improving the cAnt-MinerPB Classification Algorithm [chapter]

Matthew Medland, Fernando E. B. Otero, Alex A. Freitas
2012 Lecture Notes in Computer Science  
We have found that changing the rule quality function has little effect on the overall performance, but that by improving the rule-list quality function we can positively affect the discovered lists of  ...  We aim to improve cAnt-MinerPB in two ways, firstly by dynamically finding the rule quality function which is used while the rules are being pruned, and secondly improving the rule-list quality function  ...  In terms of the discovered model size, the use of the error-based rule-list quality (cAnt-Miner PB [E]) led to a statistically significant improvement in the size of the discovered lists, reducing the  ... 
doi:10.1007/978-3-642-32650-9_7 fatcat:ewf5tbrbdfex7nj7qyjjaps7ne

Data mining with an ant colony optimization algorithm

R.S. Parpinelli, H.S. Lopes, A.A. Freitas
2002 IEEE Transactions on Evolutionary Computation  
This work proposes an algorithm for data mining called Ant-Miner (Ant Colony-based Data Miner). The goal of Ant-Miner is to extract classification rules from data.  ...  discovered by CN2.  ...  In CN2 there is no mechanism to allow the quality of a discovered rule to be used as a feedback for constructing other rules.  ... 
doi:10.1109/tevc.2002.802452 fatcat:yxrqzgsp3re7tibmodkq7od3tm

Data Quality Measurement on Categorical Data Using Genetic Algorithm

J Malar Vizhi
2012 International Journal of Data Mining & Knowledge Management Process  
Our basic idea is to employ association rule for the purpose of data quality measurement. Strong rule generation is an important area of data mining.  ...  Data quality on categorical attribute is a difficult problem that has not received as much attention as numerical counterpart.  ...  INTRODUCTION Data Mining is the most instrumental tool in discovering knowledge from transactions [1, 2] .The most important application of data mining is discovering association rules.  ... 
doi:10.5121/ijdkp.2012.2103 fatcat:p6lo2kk6wzchritsunhx6sj2zq

Heuristic Mining Revamped: An Interactive, Data-aware, and Conformance-aware Miner

Felix Mannhardt, Massimiliano de Leoni, Hajo A. Reijers
2017 International Conference on Business Process Management  
visualized as described in literature, and (5) existing tools do not give reliable quality diagnostics for discovered models.  ...  It uses data attributes to improve the discovery procedure and provides built-in conformance checking to get direct feedback on the quality of the model.  ...  (Step 4) Discover Decision Rules. Fourth, decision rules that determine which of the output bindings may be activated are discovered.  ... 
dblp:conf/bpm/MannhardtLR17 fatcat:3ea4ooyh3jguzaxqdvkkvc4hlu
« Previous Showing results 1 — 15 out of 465,240 results