Forecasting COVID-19 cases at the Amazon region: a comparison of classical and machine learning models [article]

Dalton Garcia Borges de Souza, Francisco Tarcísio Alves Júnior, Nei Yoshihiro Soma
2020 bioRxiv   pre-print
BACKGROUND - Since the first reports of COVID-19, decision-makers have been using traditional epidemiological models to predict the days to come. However, the enhancement of computational power, the demand for adaptable predictive frameworks, the short past of the disease, and uncertainties related to input data and prediction rules, also make other classical and machine learning techniques viable options. OBJECTIVE - This study investigates the efficiency of six models in forecasting COVID-19
more » ... onfirmed cases with 17 days ahead. We compare the models autoregressive integrated moving average (ARIMA), Holt-Winters, support vector regression (SVR), k-nearest neighbors regressor (KNN), random trees regressor (RTR), seasonal linear regression with change-points (Prophet), and simple logistic regression (SLR). MATERIAL AND METHODS - We implement the models to data provided by the health surveillance secretary of Amapá, a Brazilian state fully carved in the Amazon rainforest, which has been experiencing high infection rates. We evaluate the models according to their capacity to forecast in different historical scenarios of the COVID-19 progression, such as exponential increases, sudden decreases, and stability periods of daily cases. To do so, we use a rolling forward splitting approach for out-of-sample validation. We employ the metrics RMSE, R-squared, and sMAPE in evaluating the model in different cross-validation sections. FINDINGS - All models outperform SLG, especially Holt-Winters, that performs satisfactorily in all scenarios. SVR and ARIMA have better performances in isolated scenarios. To implement the comparisons, we have created a web application, which is available online. CONCLUSION - This work represents an effort to assist the decision-makers of Amapá in future decisions to come, especially under scenarios of sudden variations in the number of confirmed cases of Amapá, which would be caused, for instance, by new contamination waves or vaccination. It is also an attempt to highlight alternative models that could be used in future epidemics.
doi:10.1101/2020.10.09.332908 fatcat:qt3ryqddzrc7vnypbbqfl5j6le