Relief for regression with missing data in variable selection

Luquan Li
Variable selection is a significant pre-processing task for prediction in the field of data mining and machine learning, and involves the selection of a subset of relevant variables. Almost all researchers have faced the problem of missing data, which can occur due to nonresponse or loss of information. This thesis develops a new variable selection technique for dealing with missing data. Relief is an algorithm for estimating the quality of each variable and is applicable to categorical or
more » ... categorical or continuous data. This thesis presents a new variable selection method, RM-Relief, by extending Relief to select the variables in a regression with missing data. RM-Relief weights all predictor variables by assigning bins for a response variable and estimating the conditional probability of unknown instances. Results on artificial and real-world datasets indicate that RM-Relief works well on regression problems with missing data.
doi:10.7282/t3vd6wsb fatcat:hw62i62gaff2tchiy3tlcffkjq