Filters








60 Hits in 8.1 sec

Bag of Na�ve Bayes: biomarker selection and classification from genome-wide SNP data

Francesco Sambo, Emanuele Trifoglio, Barbara Di Camillo, Gianna M Toffolo, Claudio Cobelli
2012 BMC Bioinformatics  
Results: In this paper, we present Bag of Naïve Bayes (BoNB), an algorithm for genetic biomarker selection and subjects classification from the simultaneous analysis of genome-wide SNP data.  ...  BoNB as an algorithm for both classification and biomarker selection from genome-wide SNP data.  ...  Discussion In this paper, we presented a novel algorithm for classification and biomarker selection from genome-wide SNP data.  ... 
doi:10.1186/1471-2105-13-s14-s2 pmid:23095127 pmcid:PMC3439675 fatcat:xwotnqhjzfhgzl3tnuf3azole4

A Review of Ensemble Methods in Bioinformatics

Pengyi Yang, Yee Hwa Yang, Bing B. Zhou, Albert Y. Zomaya
2010 Current Bioinformatics  
proteomics, gene-gene interaction identification from genome-wide association studies, and prediction of regulatory elements from DNA and protein sequences.  ...  Promising directions such as ensemble of support vector machine, meta-ensemble, and ensemble based feature selection are discussed.  ...  Acknowledgement We thank Professor Joachim Gudmundsson for critical comments and constructive suggestions which have greatly improve the early version of this article.  ... 
doi:10.2174/157489310794072508 fatcat:muzcldjxifc23kl4tynz4lwjlu

Pan-Genomic and Polymorphic Driven Prediction of Antibiotic Resistance in Elizabethkingia

Bryan Naidenov, Alexander Lim, Karyn Willyerd, Nathanial J. Torres, William L. Johnson, Hong Jin Hwang, Peter Hoyt, John E. Gustafson, Charles Chen
2019 Frontiers in Microbiology  
By producing two sets of quality biological predictors, pan-genome genes and core-genome SNPs, from long-read sequence data and applying an ensemble of ML techniques, our results demonstrated that accurate  ...  Using core-SNPs and pan-genes in combination with six machine learning (ML) algorithms, binary classification of clindamycin and vancomycin resistance achieved f1 scores of 0.94 and 0.84, respectively.  ...  Naïve Bayes Naïve Bayes is a generative model used here to capture the posterior probability of the AMR classification given the SNP/gene predictors.  ... 
doi:10.3389/fmicb.2019.01446 pmid:31333599 pmcid:PMC6622151 fatcat:la5d44532fhtheebl2nwttbq2q

Extending Classification Algorithms to Case-Control Studies

Bryan Stanfill, Sarah Reehl, Lisa Bramer, Ernesto S Nakayasu, Stephen S Rich, Thomas O Metz, Marian Rewers, Bobbie-Jo Webb-Robertson, TEDDY Study Group
2019 Biomedical Engineering and Computational Biology  
model case-control data and identify relevant biomarkers in these study designs.  ...  We demonstrate on simulated case-control data that both the classification and variable selection accuracy of each method is improved after applying this processing step and that the proposed methods are  ...  Acknowledgements The authors would like to thank PNNL scientist Jennifer Kyle for her help with filtering the lipidomic data.  ... 
doi:10.1177/1179597219858954 pmid:31320812 pmcid:PMC6630079 fatcat:w6alhazcm5cdllmtxvcuaciyee

Pan-genomic and Polymorphic Driven Prediction of Antibiotic Resistance in Elizabethkingia [article]

Bryan V Naidenov, Alexander Lim, Karyn Willyerd, Nathanial J Torres, William L Johnson, Hong Jin Hwang, Peter R Hoyt, John E Gustafson, Charles Chen
2019 bioRxiv   pre-print
Using core-SNPs and pan-genes in combination with six machine learning algorithms, binary classification of clindamycin and vancomycin resistance achieved f1 scores of 0.94 and 0.84 respectively.  ...  Pan-genomic analysis, performed with an additional 19 Elizabethkingia strains, identified a core-genome size of 2,658,537 bp, 32 uniquely identifiable intrinsic chromosomal antibiotic resistance core-genes  ...  Acknowledgements 761 This research is also supported by the NSF-MRI 1626257 for P.H. and C.C.; the work presented in  ... 
doi:10.1101/613877 fatcat:gyxkricgcrftpkxt57x5au4yke

Machine learning approach to single nucleotide polymorphism-based asthma prediction

Joverlyn Gaudillo, Jae Joseph Russell Rodriguez, Allen Nazareno, Lei Rigi Baltazar, Julianne Vilela, Rommel Bulalacao, Mario Domingo, Jason Albia, Enrique Hernandez-Lemus
2019 PLoS ONE  
Feature selection step showed that RF outperformed RFE and the feature importance score derived from RF was consistently high for a subset of SNPs, indicating the robustness of RF in selecting relevant  ...  In this work, we integrated ML-based models for feature selection and classification to quantify the risk of individual susceptibility to asthma using single nucleotide polymorphism (SNP).  ...  SVM along with Naive Bayes and decision trees have also been used to identify breast cancer cases using SNPs selected via information gain [14] .  ... 
doi:10.1371/journal.pone.0225574 pmid:31800601 pmcid:PMC6892549 fatcat:zd7xs5hrq5hrhbca4fvthr7j5u

Machine Learning as an Effective Method for Identifying True Single Nucleotide Polymorphisms in Polyploid Plants

Walid Korani, Josh P. Clevenger, Ye Chu, Peggy Ozias-Akins
2019 The Plant Genome  
core ideas • Finding reliable SNPs in polyploids is challenging • Machine learning is an efficient tool to refine SNP calling from NGS data of polyploids • SNP-ML tool was designed to facilitate SNP calling  ...  nucleotide polymorphism machine learner; SNP, single nucleotide polymorphism; SWEEP, sliding window extraction of explicit polymorphism; TB, tree bagger; TP, true-positive; WGS, whole-genome shotgun.  ...  Conflict of Interest The authors declare that there is no conflict of interest. Supplemental Material Supplemental material is available online for this article.  ... 
doi:10.3835/plantgenome2018.05.0023 fatcat:xbdm5a4ddzhkzimxjygneys5fq

Supervised learning with decision tree-based methods in computational and systems biology

Pierre Geurts, Alexandre Irrthum, Louis Wehenkel
2009 Molecular Biosystems  
in genome annotation, function prediction, or biomarker discovery.  ...  At the intersection between artificial intelligence and statistics, supervised learning provides algorithms to automatically build predictive models only from observations of a system.  ...  This paper presents research results of the Belgian Network BIOMAGNET (Bioinformatics and Modeling: from Genomes to Networks), funded by the Interuniversity Attraction Poles Programme, initiated by the  ... 
doi:10.1039/b907946g pmid:20023720 fatcat:25bpsowcznco5f6xs2cn73ke4u

A hidden Markov model-based algorithm for identifying tumour subtype using array CGH data

Ke Zhang, Yi Yang, Viswanath Devanarayan, Linglin Xie, Youping Deng, Sens Donald
2011 BMC Genomics  
Two of the major challenges for developing aCGH sample clustering are the high spatial correlation between aCGH markers and the low computing efficiency.  ...  The recent advancement in array CGH (aCGH) research has significantly improved tumor identification using DNA copy number data.  ...  Such high dimensional DNA copy number data reveals genomic heterogeneity in many cancer types, ensuring biomarker discovery for each genomic subtype at SNP copy number level [11] .  ... 
doi:10.1186/1471-2164-12-s5-s10 pmid:22369459 pmcid:PMC3287492 fatcat:hksponr2ibepjf5isjygzepnj4

Machine Learning and Data Mining Methods in Diabetes Research

Ioannis Kavakiotis, Olga Tsave, Athanasios Salifoglou, Nicos Maglaveras, Ioannis Vlahavas, Ioanna Chouvarda
2017 Computational and Structural Biotechnology Journal  
The remarkable advances in biotechnology and health sciences have led to a significant production of data, such as high throughput genetic data and clinical information, generated from large Electronic  ...  Support vector machines (SVM) arise as the most successful and widely used algorithm. Concerning the type of data, clinical datasets were mainly used.  ...  Acknowledgements This work has been partially supported by Horizon 2020 Framework Programme of the European Union under grant agreement 644906, the AEGLE project.  ... 
doi:10.1016/j.csbj.2016.12.005 pmid:28138367 pmcid:PMC5257026 fatcat:gq3lcg5i7jal7ps45vrbje6ufu

Multivariate Methods for Genetic Variants Selection and Risk Prediction in Cardiovascular Diseases

Alberto Malovini, Riccardo Bellazzi, Carlo Napolitano, Guia Guffanti
2016 Frontiers in Cardiovascular Medicine  
Given this, genome-wide association studies are proving to be powerful tools in identifying genetic variants that have the capacity to modify the probability of developing a disease or trait of interest  ...  entire genome for large sets of individuals.  ...  This work was supported by the grant RF-2011-02348444 from the Italian Ministry of Health.  ... 
doi:10.3389/fcvm.2016.00017 pmid:27376073 pmcid:PMC4896915 fatcat:5f22gw53xzh5zd6u6lox63sudm

Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions

Nivedhitha Mahendran, P. M. Durai Raj Vincent, Kathiravan Srinivasan, Chuan-Yu Chang
2020 Frontiers in Genetics  
It is the flow of information from DNA to RNA with enzymes' help, and the end product is proteins and other biochemical molecules. Many technologies can capture Gene Expression from the DNA or RNA.  ...  This problem should be addressed by reducing the dimension of the data source to a considerable amount. In recent years, Machine Learning has gained popularity in the field of genomic studies.  ...  The out-of-bag feature importance was calculated from every ensemble partition.  ... 
doi:10.3389/fgene.2020.603808 pmid:33362861 pmcid:PMC7758324 fatcat:jhyfsc72tngwhnrl4vxg3k4tii

A primer on machine learning techniques for genomic applications

Alfonso Monaco, Ester Pantaleo, Nicola Amoroso, Antonio Lacalamita, Claudio Lo Giudice, Adriano Fonzino, Bruno Fosso, Ernesto Picardi, Sabina Tangaro, Graziano Pesole, Roberto Bellotti
2021 Computational and Structural Biotechnology Journal  
multimodal genomic data are available.  ...  The analysis of large volumes of heterogeneous "omic" data, however, requires novel and efficient computational algorithms based on the paradigm of Artificial Intelligence.  ...  Naive Bayes The Naive Bayes (NB) algorithm is a classification algorithm, belonging to the class of generative models, i.e., it builds a full statistical model for both input and output.  ... 
doi:10.1016/j.csbj.2021.07.021 fatcat:lzosljvzkng57d66l4eiwtzr7q

Bioinformatics challenges for genome-wide association studies

J. H. Moore, F. W. Asselbergs, S. M. Williams
2010 Bioinformatics  
Motivation: The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide  ...  The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing.  ...  ACKNOWLEDGEMENTS We would like to thank the anonymous reviewers for their very helpful comments and suggestions. Funding: National Institutes of Health (LM010098, LM009012 and AI59694).  ... 
doi:10.1093/bioinformatics/btp713 pmid:20053841 pmcid:PMC2820680 fatcat:hcd25vxlcnacvb7w4xyk2biyka

Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning

Guishan Zhang, Yiyun Deng, Qingyu Liu, Bingxu Ye, Zhiming Dai, Yaowen Chen, Xianhua Dai
2020 Frontiers in Genetics  
Feature sets including sequence-based features, graph features, genome context, and regulatory information features were modeled in circMRT.  ...  Thus, accurate circRNA identification and prediction of its regulatory information are critical for understanding its biogenesis.  ...  Gaussian naive Bayes supposes that features are independent from each other. Gaussian naive Bayes is simpler and faster than other sophisticated methods.  ... 
doi:10.3389/fgene.2020.00655 pmid:32849764 pmcid:PMC7396586 fatcat:sgre27znyrasddkjhkfhdpy3oe
« Previous Showing results 1 — 15 out of 60 results