Selection of Variables in Logistic Regression Model with Genetic Algorithm for Stroke Prediction

Avijit Kumar Chaudhuri, Prof. Dilip K. Banerjee, Dr. Anirban Das
2021 IARJSET  
The most important issue for avoiding and preventing the progression of various diseases is earlier risk assessment and identification. To estimate disease risk factors, the researchers typically used the statistical comparative analysis or step-by-step methods of feature selection using regression techniques. The results of these methods focused on individual risk factors separately. However, rather than just one factor, a combination of factors is more likely to influence disease development.
more » ... Genetic algorithms (GA) can be beneficial and efficient for finding a combination of factors for the fastest diagnosis with the highest accuracies, especially when dealing with a large number of complicated and poorly understood components, as in diseases prediction. Our proposed model demonstrates the potential for using GA to diagnose disease and predict accuracy. Our proposed ensemble model revealed that combining a limited selection of input features gives better results than using all of the single significant features individually. This model not only forecasts the optimal feature sets and accuracy but also overcomes the dataset's missing values problem. Variables more commonly picked by LR may be more relevant for disease development prediction and accuracy by GA. IARJSET This work is licensed under a Creative Commons Attribution 4.0 International License 78 collection of features; therefore, we believe machine learning can be used to (i) improve the prophecy precision of stroke risk, and (ii) discover new risk factors. Lumley et al. 's [6] 5-year stroke prediction model adopted the Cox proportional hazards model, one of the most commonly used statistical methods in medical research [9] . It has been considerably studied [9,10] and applied to the prophecy of various diseases including stroke [6, 11, 12] . However, the performance of the actual Cox model depends mostly on the quality of the pre-selected features. To address this problem, several applications have been suggested recently [13, 14] . Thus far, there have been very few studies on differentiating the Cox regression with machine learning methods in making prophecy on censored data. Kattan [15] compared Cox proportional hazards regression with several machine learning methods (neural networks and tree-based methods) based on three urological datasets. However, Kattan's study emphasized datasets with only five features, while machine learning algorithms are thought to effectively handle many more features. In inclusion, the paper considered only some relatively simple machine learning algorithms, and high-performance machine learning algorithms such as SVM and NB were not inspected.
doi:10.17148/iarjset.2021.8817 fatcat:fahnidbie5erzig3a5y6hdpola