Filters








8,638 Hits in 4.5 sec

Pattern discovery in sequences under a Markov assumption

Darya Chudova, Padhraic Smyth
2002 Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02  
We demonstrate how the Bayes error can be used to calibrate existing discovery algorithms, providing a lower bound on achievable performance.  ...  We present a general framework for characterizing learning in this context by deriving the Bayes error rate for this problem under a Markov assumption.  ...  Acknowledgements This work was supported by the National Science Foundation under grants IIS-9703120 and IIS-0083489, and by grants from NASA and the Jet Propulsion Laboratory, the National Institute of  ... 
doi:10.1145/775047.775070 dblp:conf/kdd/ChudovaS02 fatcat:36aavyjnwfaflkqbxfa3kinlwa

Pattern discovery in sequences under a Markov assumption

Darya Chudova, Padhraic Smyth
2002 Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02  
We demonstrate how the Bayes error can be used to calibrate existing discovery algorithms, providing a lower bound on achievable performance.  ...  We present a general framework for characterizing learning in this context by deriving the Bayes error rate for this problem under a Markov assumption.  ...  Acknowledgements This work was supported by the National Science Foundation under grants IIS-9703120 and IIS-0083489, and by grants from NASA and the Jet Propulsion Laboratory, the National Institute of  ... 
doi:10.1145/775069.775070 fatcat:ltkpr33hqjbbradkwjqeymscti

Mining Signatures from Event Sequences

Rajput S.H., Chetan Jadhav, Yogesh Deshmukh, Sandip Sonawane, Hemant Jadhav
2015 IJARCCE  
The framework allows the presentation, extra4ction, and mining of high order latent occasion event structure and relationships between single and many sequences.  ...  We have suggested clinical assessment for naked interactive knowledge discovery in large electronic health record databases.  ...  mining can be of use include analysis of medical data of hospitals in a town to conjecture, for example, potential outbreaks of infectious diseases.  ... 
doi:10.17148/ijarcce.2015.44129 fatcat:up6onqghvje7rgj6bi4o4aw7ky

A survey of temporal data mining

Srivatsan Laxman, P. S. Sastry
2006 Sadhana (Bangalore)  
We also describe some recent results regarding statistical analysis of pattern discovery methods.  ...  In this article, we present an overview of techniques of temporal data mining. We mainly concentrate on algorithms for pattern discovery in sequential data streams.  ...  Since the Bayes error rate is known to be a lower bound on the error rates of all classifiers, it indicates, in some sense, the level of difficulty in the underlying pattern discovery problem.  ... 
doi:10.1007/bf02719780 fatcat:nqyjuthpmreclhj4lygqyxxmk4

Control of the False Discovery Rate Applied to the Detection of Positively Selected Amino Acid Sites

S. Guindon
2006 Molecular biology and evolution  
The null hypothesis"H 0,s : site s evolves under a negative selection or under a neutral process of evolution" is tested at each codon site of the alignment of homologous coding sequences.  ...  Recent advances in statistics have shown that the false discovery rate -in this case, the expected proportion of sites that do not evolve under positive selection among those that are estimated to evolve  ...  The expected number of false discoveries among the sites included in the list is : FDR framework . framework The first step is to generate synthetic data, denoted as D * , under M , the best model estimated  ... 
doi:10.1093/molbev/msj095 pmid:16423864 fatcat:3a5bqg6trfgd5gyslqache3jmu

An integrated framework on mining logs files for computing system management

Tao Li, Feng Liang, Sheng Ma, Wei Peng
2005 Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05  
This has been well known and experienced as a cumbersome, labor intensive, and error prone process. In addition, this process is difficult to keep up with the rapidly changing environments.  ...  In this paper, we will describe our research efforts on establishing an integrated framework for mining system log files for automatic management.  ...  The discovery of the waiting periods is carried out using the Chi-Squared test based approach first introduced in [13] . Consider an arbitrary element τ in D ab and a fixed δ.  ... 
doi:10.1145/1081870.1081972 dblp:conf/kdd/LiLMP05 fatcat:vzvbtl6njfcl7kllpooecq4qam

A RESEARCH REVIEW ON COMPARATIVE ANALYSIS OF DATA MINING TOOLS, TECHNIQUES AND PARAMETERS

Anil Sharma
2017 International Journal of Advanced Research in Computer Science  
Data mining is a process of exploring unexplored patterns from huge databases. This acts as a key to knowledge discovery which provides a great support to business world and academia.  ...  There are variety of parameters defined in the literature which provide base for a tool to perform analysis and different tools are available to perform these analysis.  ...  This is used as an analyzer for knowledge discovery in databases to be used in decision making process.  ... 
doi:10.26483/ijarcs.v8i7.4255 fatcat:ubnkgjhukfctdob5dihw2jvsjy

Performance Analysis Of Various Data Mining Classification Techniques On Healthcare Data

Shelly Gupta, Dharminder Kumar, Anand Sharma
2011 International Journal of Computer Science & Information Technology (IJCSIT)  
The standards used are percentage of accuracy and error rate of every applied classification technique. The experiments are done using the 10 fold cross validation method.  ...  A suitable technique for a particular dataset is chosen based on highest classification accuracy and least error rate.  ...  Knowledge discovery in databases consists of the list of iterative sequence steps of processes and data mining is one of the KDD processes.  ... 
doi:10.5121/ijcsit.2011.3413 fatcat:sxjvob6qezbptdcsucvyxavsda

Data preparation for data mining

Shichao Zhang, Chengqi Zhang, Qiang Yang
2003 Applied Artificial Intelligence  
Data preparation is a fundamental stage of data analysis.  ...  While a lot of low-quality information is available in various data sources and on the Web, many organizations or companies are interested in how to transform the data into cleaned forms which can be used  ...  Instead of common data cleaning works, such as removing errors and filling missing values, they use a pre-or post-analysis to evaluate the relevance of identified external data sources to the data-mining  ... 
doi:10.1080/713827180 fatcat:t7ztbwemkfgdpnkgzbyoaw66wq

Intelligent mining of large-scale bio-data: Bioinformatics applications

Farahnaz Sadat Golestan Hashemi, Mohd Razi Ismail, Mohd Rafii Yusop, Mahboobe Sadat Golestan Hashemi, Mohammad Hossein Nadimi Shahraki, Hamid Rastegari, Gous Miah, Farzad Aslani
2017 Biotechnology & Biotechnological Equipment  
Data mining, as biology intelligence, attempts to find reliable, new, useful and meaningful patterns in huge amounts of data.  ...  Finally, a broad perception of this hot topic in data science is given.  ...  [104] used a DM framework for predicting hepatitis B virus (HBV) positive patients and analysing key mutation sites in the HBV DNA sequences.  ... 
doi:10.1080/13102818.2017.1364977 fatcat:qmbiss53wfggtc7ayj2ysgt5rq

A Descriptive Framework for the Multidimensional Medical Data Mining and Representation

Sankaradass
2011 Journal of Computer Science  
Conclusion: The main objective of multidimensional Medical data mining is to provide the end user with more useful and interesting patterns.  ...  Results: In this study, we propose new rule mining technique using fuzzy logic for mining medical data in order to understand and better serve the needs of Multidimensional Breast cancer Data applications  ...  Thus the biological sequence data (Hu et al., 2009) stored at data warehouse in the format of ultidimensional temporal sequential data can be used for finding temporal pattern (Intan and Yenty, 2008  ... 
doi:10.3844/jcssp.2011.519.525 fatcat:k6pemoydrjgplo3fqsb7qvfxp4

Knowledge-Based Bioinformatics - From analysis to interpretation. * Edited by Gil Alterovitz and Marco Ramoni

S. Kim
2010 Briefings in Bioinformatics  
This chapter presents the concepts of significance testing, multiple testing, family-wise error rate (FWER), false discovery rate (FDR) to uncover unexpected patterns or relationships in data.  ...  In general, the value of a genome sequence depends on the quality of annotation.  ... 
doi:10.1093/bib/bbq070 fatcat:af44dwyq2ndqrem5rmqwf6eh2y

Classification of Eukaryotic Splice-junction Genetic Sequences Using Averaged One-dependence Estimators with Subsumption Resolution

Zaw Zaw Htike, Shoon Lei Win
2013 Procedia Computer Science  
The experimental results demonstrate the efficacy of our framework and encourage us to apply the framework on other types of genetic sequences.  ...  Since the discovery of DNA, there has been a growing interest in the problem of genetic sequence recognition, motivated by its enormous potential to cure a wide range of genetic disorders.  ...  However, soon after the discovery of split genes, researchers have started noticing patterns in the boundaries 9 .  ... 
doi:10.1016/j.procs.2013.10.006 fatcat:s7qnrwmstvdalh3sjhiobhdkti

THE APPLICATION OF DATA MINING BY CLASSIFICATION IN A DATABASE OF NOTIFIED COVID-19 CASES IN MANAUS-AM

Fábio Gomes Cantuário, Luiz Eduardo Santos de Araújo, Rilmar Pereira Gomes, David Barbosa de Alencar
2021 International journal for innovation education and research  
the Ministry of Health, which defined a system to monitor the information detected in the diagnoses of each patient.  ...  We describe the origin and spread of the virus and the use of the SGBD software MySql and MySql Workbench to improve data in the selection and pre-processing, with the resources of the weka tool for knowledge  ...  of the Naive Bayes algorithm to extract data patterns .  ... 
doi:10.31686/ijier.vol9.iss4.3060 fatcat:44kzmfdtz5elxfvodswr72lbm4

A novel locus of resistance to severe malaria in a region of ancient balancing selection

2015 Nature  
of the tree in a Bayesian framework.  ...  Using the human reference sequence and mapped sequence read data from the 1000 Genomes Project, we identified the boundaries of a 350kb region of sequence homology surrounding these genes as well as a  ... 
doi:10.1038/nature15390 pmid:26416757 pmcid:PMC4629224 fatcat:5sumvfyr35dnrcr7cujwewqmge
« Previous Showing results 1 — 15 out of 8,638 results