389 Hits in 8.5 sec

Transcriptome prediction performance across machine learning models and diverse ancestries

Paul C. Okoro, Ryan Schubert, Xiuqing Guo, W. Craig Johnson, Jerome I. Rotter, Ina Hoeschele, Yongmei Liu, Hae Kyung Im, Amy Luke, Lara R. Dugas, Heather E. Wheeler
2021 Human Genetics and Genomics Advances  
While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries  ...  We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used.  ...  The average R 2 for each of the prediction algorithms is EN = 0.0733, SVR = 0.0476, RF = 0.0409, and KNN = 0.0103.  ... 
doi:10.1016/j.xhgg.2020.100019 pmid:33937878 pmcid:PMC8087249 fatcat:mzgznh6gfvhqfcrtp65b446fla

Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data

M. S. B. Sehgal, I. Gondal, L. S. Dooley
2005 Bioinformatics  
In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction  ...  (KNN).  ...  THE CMVE ALGORITHM The complete CMVE algorithm, which is detailed in Figure 1 , introduces the concept of multiple parallel estimations of missing values.  ... 
doi:10.1093/bioinformatics/bti345 pmid:15731210 fatcat:2wyk46ap2bdfrknteypfxgzi5y

An Examination of Machine Learning Algorithms for Missing Values Imputation

It represents the research and imputation of missing values in gene expression data. By using the local or global correlation of the data we focus mostly on the contrast of the algorithms.  ...  The purpose of our review article is to focus on the developments of current techniques. For scientists rather applying different or newly develop algorithms with the identical functional goal.  ...  ACKNOWLEDGEMENTS We would like to thank Universiti Malaysia Pahang for supporting this work under the RDU Grant, Grant number: RDU180344 and RDU190113..  ... 
doi:10.35940/ijitee.l1081.10812s219 fatcat:ixexhti6jvcjbnkqj2jmrncqge

Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans

A. Xavier, William M. Muir, Katy M. Rainey
2016 BMC Bioinformatics  
The genotypic matrix captured the highest amount of genetic variance when missing loci were imputed by the method proposed in this paper.  ...  The procedures these technologies use to impute genetic data, therefore, greatly affect downstream analyses.  ...  Acknowledgement We thank the SoyNAM collaborators for their contributions to the experiment: Dr.  ... 
doi:10.1186/s12859-016-0899-7 pmid:26830693 pmcid:PMC4736474 fatcat:6bgi5ki7evgxljyn66doho6db4

A Hybrid Modified Deep Learning Data Imputation Method for Numeric Datasets

Nuran Peker, Cemalettin Kubat
2021 International Journal of Intelligent Systems and Applications in Engineering  
The imputation performance of RF-DLI is compared to K-Nearest Neighbors (KNN), Multiple Imputation by Chained Equations (MICE), MEAN imputation, and Principle Component Analysis (PCA) imputation approaches  ...  Datawig is a deep learning-based library that supports missing value imputation for all types of data. RF-DLI approach includes the following steps to impute missing data.  ...  The method benefits from deep learning for predicting the missing data and genetic algorithm for optimizing the weights of the neural network.  ... 
doi:10.18201/ijisae.2021167931 fatcat:qbgfg3st4vci5nzqz3e5v3m2qy

Data Imputation in Wireless Sensor Networks Using a Machine Learning-Based Virtual Sensor

Michael Matusowsky, Daniel T. Ramotsoela, Adnan M. Abu-Mahfouz
2020 Journal of Sensor and Actuator Networks  
The MLP was trained using a genetic algorithm which efficiently reached an optimal solution for each sensor node.  ...  Data imputation allows for a system to counteract the effect of data loss by substituting faulty or missing sensor values with system-defined virtual values.  ...  The genetic algorithm was able to converge on an optimal solution in a relatively small amount of time despite no parallelism being implemented into the training algorithm.  ... 
doi:10.3390/jsan9020025 fatcat:4xifoaih25ewnhubgu5x47se2y

A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset

Li Li, Wei Jiang, Xia Li, Kathy L. Moser, Zheng Guo, Lei Du, Qiuju Wang, Eric J. Topol, Qing Wang, Shaoqi Rao
2005 Genomics  
genetic algorithm and K nearest neighbors.  ...  We have formalized a robust gene selection approach based on a hybrid between genetic algorithm and support vector machine.  ...  Acknowledgments This work was supported in part by the National Natural Science Foundation of China (Grants 30170515 and 30370798), the National High Tech Development Project, the Chinese 863 Program (  ... 
doi:10.1016/j.ygeno.2004.09.007 pmid:15607418 fatcat:wlguq3kumjcvhgizssxw5eaxyu

A survey on missing data in machine learning

Tlamelo Emmanuel, Thabiso Maupong, Dimane Mpoeleng, Thabo Semong, Banyatsang Mphago, Oteng Tabona
2021 Journal of Big Data  
We propose and evaluate two methods, the k nearest neighbor and an iterative imputation method (missForest) based on the random forest algorithm.  ...  are most suitable for.  ...  Also, an imputation experiment was done on the KNN and RF algorithms for imputation on the Iris and novel ID fan datasets to demonstrate how popular imputation algorithms perform.  ... 
doi:10.1186/s40537-021-00516-9 pmid:34722113 pmcid:PMC8549433 fatcat:2swvf2dp5rfgjddoswh6bmpfjq

DreamAI: algorithm for the imputation of proteomics data [article]

weiping ma, Sunkyu Kim, Shrabanti Chowdhury, zhi li, Mi YANG, Seungyeul Yoo, Francesca Petralia, Jeremy Jacobsen, Jingyi Jessica Li, Xinzhou Ge, Kexin Li, Thomas Yu (+9 others)
2020 bioRxiv   pre-print
The final resulting algorithm, DreamAI, is based on an ensemble of six different imputation methods.  ...  To address this problem, the NCI-CPTAC Proteogenomics DREAM Challenge was carried out to develop effective imputation algorithms for labelled LC-MS/MS proteomics data through crowd learning.  ...  "McImpute: Matrix completion based imputation for single cell RNA-seq data." Frontiers in genetics 10 (2019): 9. 25.  ... 
doi:10.1101/2020.07.21.214205 fatcat:r7g5oy6ptnb2hmoiewhsthh2ne

CF-GeNe: Fuzzy Framework for Robust Gene Regulatory Network Inference

ApadmMuhammad Shoaib B. Sehgal, Iqbal Gondal, Laurence S. Dooley
2006 Journal of Computers  
including: Least Square Impute (LSImpute), K-Nearest Neighbour Impute (KNN), Bayesian Principal Component Analysis Impute (BPCA) and ZeroImpute.  ...  The approach uses the Collateral Missing Value Estimation (CMVE) algorithm as its core to estimate missing values in microarray gene expression data.  ...  including: Least Square Impute (LSImpute), K-Nearest Neighbour Impute (KNN) and Bayesian Principal Component Analysis (BPCA).  ... 
doi:10.4304/jcp.1.7.1-8 fatcat:2spuzrkn6ng3zmbdwzznxnfhwq

Dynamic Feature Scaling for K-Nearest Neighbor Algorithm [article]

Chandrasekaran Anirudh Bhardwaj, Megha Mishra, Kalyani Desikan
2018 arXiv   pre-print
Nearest Neighbors Algorithm is a Lazy Learning Algorithm, in which the algorithm tries to approximate the predictions with the help of similar existing vectors in the training dataset.  ...  A majority of the metrics such as Euclidean distance are scale variant, meaning that the results could vary for different range of values used for the features.  ...  The KNN algorithm is a very rigid algorithm i.e all the data points are considered for every iteration.  ... 
arXiv:1811.05062v1 fatcat:xidkfbyjjvfc5bpj5jxnzpvwwm

Empirical Analysis of Software Effort Preprocessing Techniques Based on Machine Learning

2021 International Journal of Intelligent Engineering and Systems  
The purpose of this paper is to evaluate and compare the performance of the proposed technique with the knearest neighbor imputation (kNNI) technique, random forest imputation, and multiple imputation  ...  The results show that the three imputation methods have almost the same performance.  ...  Feature selection The use of genetic algorithms (GA) as feature selection (FS) uses a parallel search random strategy, directed to the search for high fitness points, i.e. the point at which the function  ... 
doi:10.22266/ijies2021.1231.49 fatcat:w4wasj2mzvgyva3y2elyyo3g2a

An Efficient Missing Data Imputation Based On Co-Cluster Sparse Matrix Learning

F. Femila, G. Sridevi, D. Swathi, K. Swetha
2019 International Journal of Scientific Research in Computer Science Engineering and Information Technology  
This algorithm learns without reference class, and even with data continuous missing rate as high as the existing techniques.  ...  This makes the task of data processing challenging. This paper aims to design a solution for this problem which is ways different from traditional approaches.  ...  To extend MIAEC for large-scale data processing, they apply the map reduce programming model to realize the distribution and parallelization of MIAEC.  ... 
doi:10.32628/cseit195220 fatcat:rsi4z5kh4ncbvmeyofl7j37cre

Missing Value Aware Optimal Feature Selection Method for Efficient Big Data Mining Process

2019 International journal of recent technology and engineering  
In this research method Improved KNN imputation algorithm is introduced to handle the missing values.  ...  This is achieved in our previous research work by introducing the Enhanced Particle Swarm Optimization with Genetic Algorithm – Modified Artificial Neural Network (EPSOGA -MANN) which can select the optimal  ...  In this research method Improved KNN imputation algorithm is introduced to handle the missing values.  ... 
doi:10.35940/ijrte.b1055.0982s1119 fatcat:wb7fohnfofc6zmogvhb3r5cxgu


Pooja Mittal .
2014 International Journal of Research in Engineering and Technology  
The work has been implemented in WEKA environment and obtained results show that SVM is the most robust classification method and KNN is the least effective classifier for medical data sets.  ...  In this paper, the analysis has been performed for five different classification algorithms in terms of accuracy, kappa statistics, execution time, mean absolute error under three datasets, collected from  ...  Binarycoded genetic algorithms and Real-coded genetic algorithms are used for assigning weights to the features, so that set of optimal features can be deduced from high dimensional data.  ... 
doi:10.15623/ijret.2014.0306085 fatcat:2vbplzqd4fb3tkkvlxx5ytisba
« Previous Showing results 1 — 15 out of 389 results