95,482 Hits in 5.7 sec

Overfitting in Making Comparisons Between Variable Selection Methods

Juha Reunanen
2003 Journal of machine learning research  
This paper addresses a common methodological flaw in the comparison of variable selection methods.  ...  Therefore, they cannot be used reliably to compare two selection methods, as is shown by the empirical results of this paper.  ...  In that respect, LOOCV can also be used in an "outer loop" to compare variable selection methods, by running the variable selection algorithms on as many training sets as there are examples, each time  ... 
dblp:journals/jmlr/Reunanen03 fatcat:6gkqyp7bhncdpfitqrg2fws5um


David D. Jensen, Paul R. Cohen
2012 Machine Learning  
A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching.  ...  In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score.  ...  Second, the relationship between complexity and number of comparisons depends on the number of variables in the data sample S.  ... 
doi:10.1023/a:1007631014630 fatcat:dtummumsrzeqbazcolc35iqfbe

Input variable selection and calibration data selection for storm water quality regression models

Siao Sun, Jean-Luc Bertrand-Krajewski
2013 Water Science and Technology  
The comparison between results from the cluster selection method and random selection shows that the former can significantly improve performances of calibrated models.  ...  A procedure is developed in order to fulfil the two selection tasks in order. The procedure firstly selects model input variables using a cross validation method.  ...  The comparison between results from the cluster selection method and random selection shows that the cluster selection method can significantly improve model performances.  ... 
doi:10.2166/wst.2013.222 pmid:23823539 fatcat:cygwqku3djbupm5fs2r5jgci34

A Comparison between Neural Networks and other Statistical Techniques for Modeling the Relationship between Tobacco and Alcohol and Cancer

Tony Plate, Pierre Band, Joel Bert, John Grace
1996 Neural Information Processing Systems  
overfitting while retaining the ability to discover complex features in the artificial data.  ...  Flexible models, such as neural networks, have the potential to discover unanticipated features in the data. However, to be useful, flexible models must have effective control on overfitting.  ...  The NN-ORD-HCV used common method for controlling overfitting in neural networks: 10fold CV for selecting the optimal number of hidden units.  ... 
dblp:conf/nips/PlateBBG96 fatcat:rrnocrinj5giflrqek45bckbqe

Canonical correlation analysis for identifying biotypes of depression

Agoston Mihalik, Rick A. Adams, Quentin Huys
2020 Biological Psychiatry: Cognitive Neuroscience and Neuroimaging  
This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article.  ...  However, if the number of selected features is not sufficiently reduced, CCA is still prone to overfitting, as shown in (4) .  ...  Moreover, if the top-ranked variables in one dataset are highly intercorrelated, these feature selection techniques will favour their inclusion in the CCA at the expense of lower-ranked variables that  ... 
doi:10.1016/j.bpsc.2020.02.002 pmid:32224000 fatcat:ygk6mboumzclxb3qqiv2blchim

Comparison of a Genetic Algorithm Variable Selection and Interval Partial Least Squares for quantitative analysis of lactate in PBS

M. Mamouei, M. Qassem, K. Budidha, N. Baishya, P. Vadgama, P. A. Kyriacou
2019 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)  
and two interval selection methods was carried out.  ...  Moreover, an interesting finding is the emergence of local features in the proposed genetic algorithm, while, unlike the investigated interval selection methods, no explicit constraints on the locality  ...  However, the key trade-off between model selectivity and overfitting has to be considered. 2) Heuristic Global Optimization Methods: Variable (wavelength) selection can be formulated as an optimization  ... 
doi:10.1109/embc.2019.8856765 pmid:31946576 fatcat:ixkea2bxzfaapbjz44uushyhye

Predictive Error Compensating Wavelet Neural Network Model for Multivariable Time Series Prediction

Ajla Kulaglic, B. Berk Ustundag
2021 TEM Journal  
In this study, time series prediction performance of the PEC-WNNs have been evaluated on two different problems in comparison to conventional machine learning methods including the long short-term memory  ...  However, avoiding the overfitting and underfitting in ML-based time series prediction requires special consideration depending on the size and characteristics of the available training dataset.  ...  Selecting the additional data by checking the orthogonality between supplementary data and the residual error provides us a lower error without causing overfitting in our model.  ... 
doi:10.18421/tem104-61 fatcat:5u32z7mjdfc6xnvjeoqplha4jy

Protecting against evaluation overfitting in empirical reinforcement learning

Shimon Whiteson, Brian Tanner, Matthew E. Taylor, Peter Stone
2011 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)  
Designing good empirical methodologies is difficult in part because agents can overfit test evaluations and thereby obtain misleadingly high scores.  ...  We argue that reinforcement learning is particularly vulnerable to environment overfitting and propose as a remedy generalized methodologies, in which evaluations are based on multiple environments sampled  ...  We used Copeland's method [24] to select as our tuned agent the agent that performed best in all pairwise comparisons with other agents.  ... 
doi:10.1109/adprl.2011.5967363 dblp:conf/adprl/WhitesonTTS11 fatcat:tczpx35tqfcbpbf465tjfw5xha

Evaluation of Tree Based Regression over Multiple Linear Regression for Non-normally Distributed Data in Battery Performance [article]

Shovan Chowdhury, Yuxiao Lin, Boryann Liaw, Leslie Kerby
2021 arXiv   pre-print
We show that bagging, in the use of Random Forests, reduces overfitting. Our best tree-based model achieved accuracy of R^2 = 97.73%.  ...  Tree-based models perform better on this dataset, as they are non-parametric, capable of handling complex relationships among variables and not affected by multicollinearity.  ...  Attia, William Chueh, and their co-authors for their generously providing the data used in their study for the use in this work.  ... 
arXiv:2111.02513v1 fatcat:5fqmvgkh45bvphvfqqsxldkfy4

Consensus Features Nested Cross-Validation [article]

Saeid Parvandeh, Hung-Wen Yeh, Martin P. Paulus, Brett McKinney
2020 bioRxiv   pre-print
The cnCV method has similar accuracy to pEC and cnCV selects stable features between folds without the need to specify a privacy threshold.  ...  Motivation: Feature selection can improve the accuracy of machine learning models, but appro-priate steps must be taken to avoid overfitting.  ...  In addition, we vary the strength of correlation between variables.  ... 
doi:10.1101/2019.12.31.891895 fatcat:pft34lfssvdm3dkbapjjmnscpq

Model population analysis for variable selection

Hong-Dong Li, Yi-Zeng Liang, Qing-Song Xu, Dong-Sheng Cao
2010 Journal of Chemometrics  
New methods are expected to be developed by making full use of the interesting parameter in a novel manner. In this work, the elements of MPA are first considered and described.  ...  Then, the applications for variable selection and model assessment are emphasized with the help of MPA.  ...  In the present work, two methods are employed to perform variable selection coupled with PLSLDA.  ... 
doi:10.1002/cem.1300 fatcat:ydeciaalg5f6nbeopd75yndjnm

Comparison of Logistic Regression and Artificial Neural Network Models in Breast Cancer Risk Estimation

Turgay Ayer, Jagpreet Chhatwal, Oguzhan Alagoz, Charles E. Kahn, Ryan W. Woods, Elizabeth S. Burnside
2010 Radiographics  
Computer models in medical diagnosis are being developed to help physicians differentiate between healthy patients and patients with disease.  ...  clinical decision making.  ...  Significant variables can be selected with various methods.  ... 
doi:10.1148/rg.301095057 pmid:19901087 pmcid:PMC3709515 fatcat:gt5kegta3nb27k76swtm23gp6a

Optimizing Trait Predictability in Hybrid Rice Using Superior Prediction Models and Selective Omic Datasets [article]

Shibo Wang, Julong Wei, Ruidong Li, Han Qu, Weibo Xie, Zhenyu Jia
2018 bioRxiv   pre-print
Our study has provided a guideline for selection of hybrid rice in terms of which types of omic datasets and which method should be used to achieve higher trait predictability.  ...  Hybrid breeding has dramatically boosted yield and its stability in rice. Genomic prediction further benefits rice breeding by increasing selection intensity and accelerating breeding cycles.  ...  Figure 5 . 5 Multiple comparisons of the means of levels of overfitting for the four traits in the IMF2 196 population by the six prediction methods, with the differences between the seven combinations  ... 
doi:10.1101/261263 fatcat:scphasa655drbeqwahrezhhdai

Feature Selection in a Credit Scoring Model

Juan Laborda, Seyong Ryoo
2021 Mathematics  
Three different feature selection methods are used in order to mitigate the overfitting in the curse of dimensionality of these classification algorithms: one filter method (Chi-squared test and correlation  ...  coefficients) and two wrapper methods (forward stepwise selection and backward stepwise selection).  ...  Comparison of Findings with Existing Literature-Managerial Implications The results obtained in this study suggest that the feature selection process helped credit scoring models to be simpler, making  ... 
doi:10.3390/math9070746 fatcat:5s2sdzg365d3zmkshwfjsost3y

An Efficient Elastic Net with Regression Coefficients Method for Variable Selection of Spectrum Data

Wenya Liu, Qi Li, Fengfeng Zhou
2017 PLoS ONE  
An efficient elastic net with regression coefficients method (Enet-BETA) is proposed to select the significant variables of the spectrum data in this paper.  ...  The proposed Enet-BETA method can not only select important variables to make the quality easy to interpret, but also can improve the stability and feasibility of the built model.  ...  The comparison of the six different variable selection methods is tabulated in Table 2 . In Table 2 , the NOVS is the number of selected variables.  ... 
doi:10.1371/journal.pone.0171122 pmid:28152003 pmcid:PMC5289531 fatcat:tzpstfw6n5cobmfmuuiha3j2um
« Previous Showing results 1 — 15 out of 95,482 results