A Model for Rapid Selection and COVID-19 Prediction with Dynamic and Imbalanced Data
The COVID-19 pandemic is threatening our quality of life and economic sustainability. The rapid spread of COVID-19 around the world requires each country or region to establish appropriate anti-proliferation policies in a timely manner. It is important, in making COVID-19-related health policy decisions, to predict the number of confirmed COVID-19 patients as accurately and quickly as possible. Predictions are already being made using several traditional models such as the susceptible,
... and recovered (SIR) and susceptible, exposed, infected, and resistant (SEIR) frameworks, but these predictions may not be accurate due to the simplicity of the models, so a prediction model with more diverse input features is needed. However, it is difficult to propose a universal predictive model globally because there are differences in data availability by country and region. Moreover, the training data for predicting confirmed patients is typically an imbalanced dataset consisting mostly of normal data; this imbalance negatively affects the accuracy of prediction. Hence, the purposes of this study are to extract rules for selecting appropriate prediction algorithms and data imbalance resolution methods according to the characteristics of the datasets available for each country or region, and to predict the number of COVID-19 patients based on these algorithms. To this end, a decision tree-type rule was extracted to identify 13 data characteristics and a discrimination algorithm was selected based on those characteristics. With this system, we predicted the COVID-19 situation in four regions: Africa, China, Korea, and the United States. The proposed method has higher prediction accuracy than the random selection method, the ensemble method, or the greedy method of discriminant analysis, and prediction takes very little time.