Filters








82,006 Hits in 8.8 sec

Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms [article]

Panagiotis Mandros, Mario Boley, Jilles Vreeken
2018 arXiv   pre-print
The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data.  ...  Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style  ...  CONCLUSION We investigated the algorithmic aspect of discovering dependencies in data using the reliable fraction of information, where we proved the NP-hardness of the problem and derived a refined bounding  ... 
arXiv:1809.05467v1 fatcat:3dqv2faycjhz3b2xntsruaguu4

Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms

Panagiotis Mandros, Mario Boley, Jilles Vreeken
2018 2018 IEEE International Conference on Data Mining (ICDM)  
The reliable fraction of information is an attractive score for quantifying (functional) dependencies in highdimensional data.  ...  Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style  ...  CONCLUSION We investigated the algorithmic aspect of discovering dependencies in data using the reliable fraction of information, where we proved the NP-hardness of the problem and derived a refined bounding  ... 
doi:10.1109/icdm.2018.00047 dblp:conf/icdm/MandrosBV18 fatcat:7eaekvnphfb6vabrziwthpjcce

Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms

Panagiotis Mandros, Mario Boley, Jilles Vreeken
2019 Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence  
The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data.  ...  Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style  ...  Conclusion We investigated the algorithmic aspect of discovering dependencies in data using the reliable fraction of information, where we proved the NP-hardness of the problem and derived a refined bounding  ... 
doi:10.24963/ijcai.2019/864 dblp:conf/ijcai/MandrosBV19 fatcat:bkcwugsdtnbpvnivuhsokah43e

Data Summarization based on Multiple Attributes in Unreliable Categorical Data

2019 International journal of recent technology and engineering  
We proposed and implemented Enhanced Categorical Cluster Ensemble Approach (ECCEA) to handle data relations between different attributes to explore data from uncertain data.  ...  This approach consists of matrix to describe anonymous records into groups in indeterminate dependable data streams with attribute splitting and feature selection.  ...  and hard place data sets.  ... 
doi:10.35940/ijrte.b1282.0982s1119 fatcat:4dyldz3xizd6np5q4bn5y63cyi

Validating a Big Data for Data Quality using Single Column Data Pattern Profiling Technique

2020 VOLUME-8 ISSUE-10, AUGUST 2019, REGULAR ISSUE  
Pattern matching helps to discover the various pattern values within the data and validate the values against any organizations.  ...  Data profiling has different types of analysis techniques to correct the data such as Single Column analysis, Multicolumn analysis, Multi table and Data dependencies.  ...  Cross-column Profiling discovers the column and column combination with unique values that are defined. It is made up of two processes: Key analysis and Dependency analysis.  ... 
doi:10.35940/ijitee.e2749.039520 fatcat:ejofzux33jb4zmtravfkvya4zu

Workload-Aware Discovery of Integrity Constraints for Data Cleaning

Eduardo Peña
2018 Very Large Data Bases Conference  
This paper overviews our research on discovering ICs that are particularly suitable for data cleaning, i.e., ICs that are resilient to data and application evolutions.  ...  Manually designing ICs is burdensome, and it is doomed to fail if we consider the dynamic changes of data and applications.  ...  Discovering DCs We have first worked on the algorithmic issues of discovering DCs from data.  ... 
dblp:conf/vldb/Pena18 fatcat:g4uyqpsc75eppf5ctpjiqdfk2u

Dynamic data assigning assessment clustering of streaming data

O. Georgieva, F. Klawonn
2008 Applied Soft Computing  
Discovering interesting patterns or substructures in data streams is an important challenge in data mining.  ...  The new extended version of the algorithm is an incremental clustering approach applicable to stream data. It identifies new clusters formed by the incoming data and updates the data space partition.  ...  Acknowledgements We would like to thank Deutsche Wetterdienst (DWD) for providing the weather data set for our research purposes.  ... 
doi:10.1016/j.asoc.2007.11.006 fatcat:u4xeuxnckjfodj2v2zmm7rcds4

Discovering Reliable Correlations in Categorical Data

Panagiotis Mandros, Mario Boley, Jilles Vreeken
2019 2019 IEEE International Conference on Data Mining (ICDM)  
data or the type of correlation, and, how to efficiently discover the topmost reliably correlated attribute sets from data.  ...  In many scientific tasks we are interested in discovering whether there exist any correlations in our data.  ...  data or the type of correlation, and, how to efficiently discover the topmost reliably correlated attribute sets from data.  ... 
doi:10.1109/icdm.2019.00156 dblp:conf/icdm/MandrosBV19 fatcat:2kwhrtex4vbdxp2c6zgubozl2q

Big Data, Fast Data and Data Lake Concepts

Natalia Miloslavskaya, Alexander Tolstoy
2016 Procedia Computer Science  
Today we witness the appearance of two additional to Big Data concepts: data lakes and fast data. Are they simply the new marketing labels for the old Big Data IT or really new ones?  ...  approach; Frequent-itemset data mining, including associative rules, market-baskets, the a-priori algorithm and its improvements; Very large, high-dimensional datasets clustering algorithms; Web applications  ...  In general, big data processing is aimed at data mining refers to extracting or «mining» (discover) knowledge from large amounts of data.  ... 
doi:10.1016/j.procs.2016.07.439 fatcat:ervzqsx7mvbyjonrfezabxdtwa

Data Mining: Web Data Mining Techniques, Tools and Algorithms: An Overview

Muhammd Jawad Hamid Mughal
2018 International Journal of Advanced Computer Science and Applications  
All these types use different techniques, tools, approaches, algorithms for discover information from huge bulks of data over the web.  ...  As increasing growth of data over the internet, it is getting difficult and time consuming for discovering informative knowledge and patterns.  ...  Different algorithmic techniques are used to discover data from web.  ... 
doi:10.14569/ijacsa.2018.090630 fatcat:szpscjf6rrhbhgx43ktrdx4fsi

Discovering Reliable Correlations in Categorical Data [article]

Panagiotis Mandros, Mario Boley, Jilles Vreeken
2019 arXiv   pre-print
data or the type of correlation, and, how to efficiently discover the top-most reliably correlated attribute sets from data.  ...  In many scientific tasks we are interested in discovering whether there exist any correlations in our data.  ...  data or the type of correlation, and, how to efficiently discover the topmost reliably correlated attribute sets from data.  ... 
arXiv:1908.11682v1 fatcat:7dxcrce53nacxo7mlrmg4virai

Big data problems on discovering and analyzing causal relationships in epidemiological data

Yiheng Liang, Armin R. Mikler
2014 2014 IEEE International Conference on Big Data (Big Data)  
Algorithms of efficiently discovering noncausal factors are developed and proved.  ...  Causal relationships can be possibly discovered through learning the network structures from data.  ...  In order to find interesting information such as previously unnoticed associations, an efficient algorithm should be utilized because time and cost of discovering such information from large-scale data  ... 
doi:10.1109/bigdata.2014.7004421 dblp:conf/bigdataconf/LiangM14 fatcat:zr4b3oxlurhf3b7tgiw3yf63ji

Data Quality: Theory and Practice [chapter]

Wenfei Fan
2012 Lecture Notes in Computer Science  
incomplete data, and a data currency model for answering queries with current values from possibly stale data in the absence of reliable timestamps.  ...  We also discuss techniques for automatically discovering data quality rules, detecting errors in real-life data, and for correcting errors with performance guarantees.  ...  Already hard for relational data, error detection and repairing are far more challenging for data with complex structures.  ... 
doi:10.1007/978-3-642-32281-5_1 fatcat:ifbeaxc44vcirhtbxcblirlada

Understanding Data Completeness in Network Monitoring Systems

F. Korn, Ruilin Liu, Hui Wang
2012 2012 IEEE 12th International Conference on Data Mining  
It is therefore vital to understand the completeness and reliability of such data.  ...  In many networks including Internet Service Providers, transportation monitoring systems and the electric grid, measurements from a set of objects are continuously taken over time and used for important  ...  ACKNOWLEDGEMENTS We thank Divesh Srivastava, Howard Karloff and Lukasz Golab for initial discussions that led to this work.  ... 
doi:10.1109/icdm.2012.149 dblp:conf/icdm/KornLW12 fatcat:nvkojwqzmrdwjas5fgqfvmzu5u

Automated perceptions in data mining

M. Last, A. Kandel
1999 FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315)  
The automated tasks include comparison of frequency distributions, evaluating reliability of dependent variables, and detecting outliers in noisy data.  ...  The human eye can capture complex patterns and relationships, along with detecting the outlying (exceptional) cases in a data set.  ...  Finally, the reliability of dependent attributes can be evaluated by using many data mining models.  ... 
doi:10.1109/fuzzy.1999.793233 fatcat:6gzege354zgc3opy2kejmq3gji
« Previous Showing results 1 — 15 out of 82,006 results