A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2007; you can also visit the original URL.
The file type is application/pdf
.
Filters
Pattern discovery in sequences under a Markov assumption
2002
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02
We demonstrate how the Bayes error can be used to calibrate existing discovery algorithms, providing a lower bound on achievable performance. ...
We present a general framework for characterizing learning in this context by deriving the Bayes error rate for this problem under a Markov assumption. ...
Acknowledgements This work was supported by the National Science Foundation under grants IIS-9703120 and IIS-0083489, and by grants from NASA and the Jet Propulsion Laboratory, the National Institute of ...
doi:10.1145/775047.775070
dblp:conf/kdd/ChudovaS02
fatcat:36aavyjnwfaflkqbxfa3kinlwa
Pattern discovery in sequences under a Markov assumption
2002
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02
We demonstrate how the Bayes error can be used to calibrate existing discovery algorithms, providing a lower bound on achievable performance. ...
We present a general framework for characterizing learning in this context by deriving the Bayes error rate for this problem under a Markov assumption. ...
Acknowledgements This work was supported by the National Science Foundation under grants IIS-9703120 and IIS-0083489, and by grants from NASA and the Jet Propulsion Laboratory, the National Institute of ...
doi:10.1145/775069.775070
fatcat:ltkpr33hqjbbradkwjqeymscti
Mining Signatures from Event Sequences
2015
IJARCCE
The framework allows the presentation, extra4ction, and mining of high order latent occasion event structure and relationships between single and many sequences. ...
We have suggested clinical assessment for naked interactive knowledge discovery in large electronic health record databases. ...
mining can be of use include analysis of medical data of hospitals in a town to conjecture, for example, potential outbreaks of infectious diseases. ...
doi:10.17148/ijarcce.2015.44129
fatcat:up6onqghvje7rgj6bi4o4aw7ky
A survey of temporal data mining
2006
Sadhana (Bangalore)
We also describe some recent results regarding statistical analysis of pattern discovery methods. ...
In this article, we present an overview of techniques of temporal data mining. We mainly concentrate on algorithms for pattern discovery in sequential data streams. ...
Since the Bayes error rate is known to be a lower bound on the error rates of all classifiers, it indicates, in some sense, the level of difficulty in the underlying pattern discovery problem. ...
doi:10.1007/bf02719780
fatcat:nqyjuthpmreclhj4lygqyxxmk4
Control of the False Discovery Rate Applied to the Detection of Positively Selected Amino Acid Sites
2006
Molecular biology and evolution
The null hypothesis"H 0,s : site s evolves under a negative selection or under a neutral process of evolution" is tested at each codon site of the alignment of homologous coding sequences. ...
Recent advances in statistics have shown that the false discovery rate -in this case, the expected proportion of sites that do not evolve under positive selection among those that are estimated to evolve ...
The expected number of false discoveries among the sites included in the list is :
FDR framework . framework The first step is to generate synthetic data, denoted as D * , under M , the best model estimated ...
doi:10.1093/molbev/msj095
pmid:16423864
fatcat:3a5bqg6trfgd5gyslqache3jmu
An integrated framework on mining logs files for computing system management
2005
Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05
This has been well known and experienced as a cumbersome, labor intensive, and error prone process. In addition, this process is difficult to keep up with the rapidly changing environments. ...
In this paper, we will describe our research efforts on establishing an integrated framework for mining system log files for automatic management. ...
The discovery of the waiting periods is carried out using the Chi-Squared test based approach first introduced in [13] . Consider an arbitrary element τ in D ab and a fixed δ. ...
doi:10.1145/1081870.1081972
dblp:conf/kdd/LiLMP05
fatcat:vzvbtl6njfcl7kllpooecq4qam
A RESEARCH REVIEW ON COMPARATIVE ANALYSIS OF DATA MINING TOOLS, TECHNIQUES AND PARAMETERS
2017
International Journal of Advanced Research in Computer Science
Data mining is a process of exploring unexplored patterns from huge databases. This acts as a key to knowledge discovery which provides a great support to business world and academia. ...
There are variety of parameters defined in the literature which provide base for a tool to perform analysis and different tools are available to perform these analysis. ...
This is used as an analyzer for knowledge discovery in databases to be used in decision making process. ...
doi:10.26483/ijarcs.v8i7.4255
fatcat:ubnkgjhukfctdob5dihw2jvsjy
Performance Analysis Of Various Data Mining Classification Techniques On Healthcare Data
2011
International Journal of Computer Science & Information Technology (IJCSIT)
The standards used are percentage of accuracy and error rate of every applied classification technique. The experiments are done using the 10 fold cross validation method. ...
A suitable technique for a particular dataset is chosen based on highest classification accuracy and least error rate. ...
Knowledge discovery in databases consists of the list of iterative sequence steps of processes and data mining is one of the KDD processes. ...
doi:10.5121/ijcsit.2011.3413
fatcat:sxjvob6qezbptdcsucvyxavsda
Data preparation for data mining
2003
Applied Artificial Intelligence
Data preparation is a fundamental stage of data analysis. ...
While a lot of low-quality information is available in various data sources and on the Web, many organizations or companies are interested in how to transform the data into cleaned forms which can be used ...
Instead of common data cleaning works, such as removing errors and filling missing values, they use a pre-or post-analysis to evaluate the relevance of identified external data sources to the data-mining ...
doi:10.1080/713827180
fatcat:t7ztbwemkfgdpnkgzbyoaw66wq
Intelligent mining of large-scale bio-data: Bioinformatics applications
2017
Biotechnology & Biotechnological Equipment
Data mining, as biology intelligence, attempts to find reliable, new, useful and meaningful patterns in huge amounts of data. ...
Finally, a broad perception of this hot topic in data science is given. ...
[104] used a DM framework for predicting hepatitis B virus (HBV) positive patients and analysing key mutation sites in the HBV DNA sequences. ...
doi:10.1080/13102818.2017.1364977
fatcat:qmbiss53wfggtc7ayj2ysgt5rq
A Descriptive Framework for the Multidimensional Medical Data Mining and Representation
2011
Journal of Computer Science
Conclusion: The main objective of multidimensional Medical data mining is to provide the end user with more useful and interesting patterns. ...
Results: In this study, we propose new rule mining technique using fuzzy logic for mining medical data in order to understand and better serve the needs of Multidimensional Breast cancer Data applications ...
Thus the biological sequence data (Hu et al., 2009) stored at data warehouse in the format of ultidimensional temporal sequential data can be used for finding temporal pattern (Intan and Yenty, 2008 ...
doi:10.3844/jcssp.2011.519.525
fatcat:k6pemoydrjgplo3fqsb7qvfxp4
Knowledge-Based Bioinformatics - From analysis to interpretation. * Edited by Gil Alterovitz and Marco Ramoni
2010
Briefings in Bioinformatics
This chapter presents the concepts of significance testing, multiple testing, family-wise error rate (FWER), false discovery rate (FDR) to uncover unexpected patterns or relationships in data. ...
In general, the value of a genome sequence depends on the quality of annotation. ...
doi:10.1093/bib/bbq070
fatcat:af44dwyq2ndqrem5rmqwf6eh2y
Classification of Eukaryotic Splice-junction Genetic Sequences Using Averaged One-dependence Estimators with Subsumption Resolution
2013
Procedia Computer Science
The experimental results demonstrate the efficacy of our framework and encourage us to apply the framework on other types of genetic sequences. ...
Since the discovery of DNA, there has been a growing interest in the problem of genetic sequence recognition, motivated by its enormous potential to cure a wide range of genetic disorders. ...
However, soon after the discovery of split genes, researchers have started noticing patterns in the boundaries 9 . ...
doi:10.1016/j.procs.2013.10.006
fatcat:s7qnrwmstvdalh3sjhiobhdkti
THE APPLICATION OF DATA MINING BY CLASSIFICATION IN A DATABASE OF NOTIFIED COVID-19 CASES IN MANAUS-AM
2021
International journal for innovation education and research
the Ministry of Health, which defined a system to monitor the information detected in the diagnoses of each patient. ...
We describe the origin and spread of the virus and the use of the SGBD software MySql and MySql Workbench to improve data in the selection and pre-processing, with the resources of the weka tool for knowledge ...
of the Naive Bayes algorithm to extract data patterns . ...
doi:10.31686/ijier.vol9.iss4.3060
fatcat:44kzmfdtz5elxfvodswr72lbm4
A novel locus of resistance to severe malaria in a region of ancient balancing selection
2015
Nature
of the tree in a Bayesian framework. ...
Using the human reference sequence and mapped sequence read data from the 1000 Genomes Project, we identified the boundaries of a 350kb region of sequence homology surrounding these genes as well as a ...
doi:10.1038/nature15390
pmid:26416757
pmcid:PMC4629224
fatcat:5sumvfyr35dnrcr7cujwewqmge
« Previous
Showing results 1 — 15 out of 8,638 results