33,658 Hits in 5.0 sec

Generation of synthetic data sets for evaluating the accuracy of knowledge discovery systems

Daniel R. Jeske, Ryan Rich, Behrokh Samadi, Pengyue J. Lin, Lan Ye, Sean Cox, Rui Xiao, Ted Younglove, Minh Ly, Douglas Holt
2005 Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05  
Information Discovery and Analysis Systems (IDAS) are designed to correlate multiple sources of data and use data mining techniques to identify potential events that could occur in the future.  ...  The IDSG tool will feature a default set of attribute generation capabilities as well as a wizard that allows users to grow the scope of data the tool can generate.  ...  data sets with varying degrees of accuracy.  ... 
doi:10.1145/1081870.1081969 dblp:conf/kdd/JeskeSLYCXYLHR05 fatcat:bcjx7hsay5g25cnopw3v3crxii

A Recommender System for Process Discovery [chapter]

Joel Ribeiro, Josep Carmona, Mustafa Mısır, Michele Sebag
2014 Lecture Notes in Computer Science  
In this paper, we present a recommender system that uses portfolio-based algorithm selection strategies to face the following problems: to find the best discovery algorithm for the data at hand, and to  ...  Experiments performed with the developed tool witness the usefulness of the approach for a variety of instances.  ...  Fitness Generalization Performance Precision Simplicity Real Life Synthetic The figure on the right presents the average accuracy of the prediction of the best-performing technique for each measure category  ... 
doi:10.1007/978-3-319-10172-9_5 fatcat:vid24t64lfgedfi6vzx3vb3akq

How deep is knowledge tracing? [article]

Mohammad Khajah and Robert V. Lindsey and Michael C. Mozer
2016 arXiv   pre-print
We argue that while DKT is a powerful, useful, general-purpose framework for modeling student learning, its gains do not come from the discovery of novel representations---the fundamental advantage of  ...  In this article, we attempt to understand the basis for DKT's advantage by considering the sources of statistical regularity in the data that DKT can leverage but which BKT cannot.  ...  [22] reported substantial improvements in prediction performance with DKT over BKT on two realworld data sets (Assistments, Khan Academy) and one synthetic data set which was generated under assumptions  ... 
arXiv:1604.02416v2 fatcat:wsl4oujmszfgbmfedqndihurlm

Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery

David Minnen, Charles Isbell, Irfan Essa, Thad Starner
2007 Seventh IEEE International Conference on Data Mining (ICDM 2007)  
To validate our algorithm, we discuss its theoretical properties and empirically evaluate it using several data sets including synthetic data and motion capture data collected by an on-body inertial sensor  ...  This paper addresses the problem of locating subdimensional motifs in real-valued, multivariate time series, which requires the simultaneous discovery of sets of recurring patterns along with the corresponding  ...  The discovery system, which has no knowledge of the pattern, must then locate the planted motifs.  ... 
doi:10.1109/icdm.2007.52 dblp:conf/icdm/MinnenIES07 fatcat:mix7dd5zbzaidp3kxh2kf2x24u

The ATEN Framework for Creating the Realistic Synthetic Electronic Health Record

Scott McLachlan, Kudakwashe Dube, Thomas Gallagher, Bridget Daley, Jason Walonoski
2018 Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies  
The development of the generic methods for achieving and validating realism for synthetic data was achieved by using knowledge discovery in databases (KDD), data mining enhanced with concept analysis and  ...  The knowledge discovery process improves and expedites the generation process; having a more complex and complete understanding of the knowledge required to create the synthetic data significantly reduce  ...  ACKNOWLEDGEMENTS SM acknowledges support from the EPSRC under project EP/P009964/1: PAMBAYESIAN: Patient Managed decision-support using Bayes Networks. For Danika, Thomas, Liam and James.  ... 
doi:10.5220/0006677602200230 dblp:conf/biostec/McLachlanDGDW18 fatcat:fqioptababfxtng2oxvpw23puu

Adaptation and Generalization for Unknown Sensitive Factors of Variations [article]

William Paul, Philippe Burlina
2021 arXiv   pre-print
This leads us to consider various settings (unsupervised, domain generalization, semi-supervised) that correspond to different degrees of incomplete knowledge about those factors.  ...  We demonstrate the ability for interventions on discovered/source factors to generalize to target/real factors.  ...  Introduction Deploying artificial intelligence (AI) systems in real world settings requires greater assurances for robustness and trust in system behavior.  ... 
arXiv:2107.13625v3 fatcat:2dtowkl4ebcn3o4leyv5fperuy

Approximate Truth Discovery via Problem Scale Reduction

Xianzhi Wang, Quan Z. Sheng, Xiu Susie Fang, Xue Li, Xiaofei Xu, Lina Yao
2015 Proceedings of the 24th ACM International on Conference on Information and Knowledge Management - CIKM '15  
Current solutions to this problem detect the veracity of each value jointly with the reliability of each source for every data item.  ...  The groups are then used for efficient inter-value influence computation to improve the accuracy. Our approach is applicable to most existing truth discovery algorithms.  ...  Experiments on Synthetic Datasets Datasets Preparation We generate synthetic data to evaluate the effectiveness of our approach on larger datasets.  ... 
doi:10.1145/2806416.2806444 dblp:conf/cikm/WangSFLXY15 fatcat:ynipce3wojhl7fqbguu75cgvnm

Using spatial correspondences for hyperspectral knowledge transfer: Evaluation on synthetic data

Brian D. Bue, Erzsebet Merenyi
2010 2010 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing  
We evaluate the technique using state of the art synthetic hyperspectral imagery.  ...  We describe a proof of concept for class knowledge transfer from a labeled hyperspectral image to an unlabeled image, captured with a different (hyper-/multi-spectral) sensor, when the spatial extents  ...  John Kerekes at the RIT Digital Imaging and Remote Sensing (DIRS) Laboratory for their gracious assistance in generating the DIRSIG image data used in this work. We would also like to thank Dr. Maj.  ... 
doi:10.1109/whispers.2010.5594944 dblp:conf/whispers/BueM10 fatcat:eefidc4rrnb43kc4xufyuzsjhy

Visual Physics: Discovering Physical Laws from Videos [article]

Pradyumna Chari, Chinmay Talegaonkar, Yunhao Ba, Achuta Kadambi
2019 arXiv   pre-print
In this paper, we teach a machine to discover the laws of physics from video streams. We assume no prior knowledge of physics, beyond a temporal stream of bounding boxes.  ...  We evaluate our ability to discover physical laws on videos of elementary physical phenomena, such as projectile motion or circular motion.  ...  DAMPED OSCILLATION (synthetic): Damping is a general energy loss mechanism for various systems, and one of the common forms of damping is the exponential decay.  ... 
arXiv:1911.11893v1 fatcat:ilzvb33jjjhgvcisnsgkfbcgrm

Causal Discovery for Manufacturing Domains [article]

Katerina Marazopoulou, Rumi Ghosh, Prasanth Lade, David Jensen
2016 arXiv   pre-print
This work demonstrates how data mining and knowledge discovery can be used for root cause analysis in the domain of manufacturing and connected industry.  ...  Standard evaluation techniques for causal structure learning shows that the learned causal models seem to closely represent the underlying latent causal relationship between different factors in the production  ...  Generation of synthetic data The first step towards the evaluation through synthetic data is the generation of the synthetic model and data.  ... 
arXiv:1605.04056v2 fatcat:kqef7cwlqnh6fafjpivgq5jlwu

Synthetic data generator for testing of classification rule algorithms

Romana Seidlová, Jaroslav Poživil, Jaromír Seidl, Lukáš Malecl
2017 Neural Network World  
To our knowledge, our system is probably the first synthetic data generation system that systematically generates datasets for examination and judgment of the classification rule algorithms.  ...  We developed a data generating system that is able to create systematically testing datasets that accomplish user's requirements such as number of rows, number and type of attributes, number of missing  ...  To our knowledge, our system is probably the first synthetic data generation system that systematically generates datasets for examination and judgment of the classification rule algorithms.  ... 
doi:10.14311/nnw.2017.27.010 fatcat:unm7domiijhdfkugpbiqgnxhnu

Adapting State-of-the-Art Deep Language Models to Clinical Information Extraction Systems: Potentials, Challenges, and Solutions

Liyuan Zhou, Hanna Suominen, Tom Gedeon
2019 JMIR Medical Informatics  
First, word representations trained from different domains served as the input of a DL system for information extraction.  ...  A total of 3 independent datasets were generated for this task, and they were used as the training (101 patient reports), validation (100 patient reports), and test (100 patient reports) sets in our experiments  ...  Using authentic or synthetic clinical data sets other that these data sets 1 and 2 for setting up the system is not permitted.  ... 
doi:10.2196/11499 pmid:31021325 pmcid:PMC6658232 fatcat:32izaz3xtjaltbqitgiqjx5owu

Constrained Motif Discovery in Time Series

Yasser Mohammad, Toyoaki Nishida
2009 New generation computing  
We then compare the combination of RSST and MCFull or MCInc with two state-of-the-art motif discovery algorithms on a large set of synthetic time series.  ...  In this paper we define the Constrained Motif Discovery problem which enables utilization of domain knowledge into the motif discovery process.  ...  Section 6 provides detailed evaluation of the proposed MCFull and MCInc algorithms on a large synthetic data set.  ... 
doi:10.1007/s00354-009-0068-x fatcat:46lmwxle7jdlbaixulke2pgy24

Efficient and interpretable fuzzy classifiers from data with support vector learning

Stergios Papadimitriou, Konstantinos Terzidis
2005 Intelligent Data Analysis  
The accurate set of rules can be approximated with a simpler interpretable fuzzy system that can present insight to the more important aspects of the data.  ...  Support Vector algorithms are adapted for the identification of a Support Vector Fuzzy Inference (SVFI) system that obtains robust generalization performance.  ...  Acknowledgment This work was partially supported from a European Union funded EPEAK II project "Arximidis", code 04-3-001/5, performed at the Technological Educational Institute of Kavalas, Dept. of Information  ... 
doi:10.3233/ida-2005-9603 fatcat:rn5ytq7t5rcvxafyiddxahk43a

GraphBAD: A general technique for anomaly detection in security information and event management

Simon Parkinson, Mauro Vallati, Andrew Crampton, Shirin Sohrabi
2018 Concurrency and Computation  
A large experimental analysis, conducted on both publicly available (the well-known KDD dataset) and synthetically generated testing sets (file system permissions), demonstrates the ability of GraphBAD  ...  computer users in increasing the security of their systems.  ...  The technique is used for generating synthetic datasets for file system access control.  ... 
doi:10.1002/cpe.4433 fatcat:2ei6nyqglngg5kmmweocbvxodi
« Previous Showing results 1 — 15 out of 33,658 results