Data Mining Techniques for Early Diagnosis of Diabetes: A Comparative Study

Luís Chaves, Gonçalo Marques
2021 Applied Sciences  
Diabetes is a life-long condition that is well-known in the 21st century. Once known as a disease of the West, the rise of diabetes has been fed by a nutrition shift, rapid urbanization and increasingly sedentary lifestyles. In late 2019, a new public health concern was emerging (COVID-19), with a particular hazard concerning people living with diabetes. Medical institutes have been collecting data for years. We expect to achieve predictions for pathological complications, which hopefully will
more » ... revent the loss of lives and improve the quality of life using data mining processes. This work proposes a comparative study of data mining techniques for early diagnosis of diabetes. We use a publicly accessible data set containing 520 instances, each with 17 attributes. Naive Bayes, Neural Network, AdaBoost, k-Nearest Neighbors, Random Forest and Support Vector Machine methods have been tested. The results suggest that Neural Networks should be used for diabetes prediction. The proposed model presents an AUC of 98.3% and 98.1% accuracy, an F1-Score, Precision and Sensitivity of 98.4% and a Specificity of 97.5%.
doi:10.3390/app11052218 doaj:d3bad609bf8549268b4d2c74ebc50c49 fatcat:udxbvgweyrh4hmaujpwmls6cee