Digital forecasting of COVID-19 case rates in United States regions: an analysis of search-engine query patterns (Preprint) [post]

Henry Cousins, Clara Cousins, Alon Harris, Louis Pasquale
2020 unpublished
BACKGROUND Timely allocation of medical resources for COVID-19 requires early detection of regional outbreaks. Internet browsing data, such as search activity levels, may provide predictive ability for estimating cases in a local population that are yet to be confirmed. OBJECTIVE The objective of our study was to determine whether search-engine query patterns can forecast COVID-19 case rates at the state and local levels in the United States. METHODS We used regional confirmed case data from
more » ... New York Times and Google Trends results from 50 states and 203 county-based designated market areas (DMA). We identified search terms whose activity precedes and correlates with confirmed case rates at the national level, using univariate regression to construct a composite explanatory variable based on top-scoring search queries offset by temporal lags. We measured the correlation of the explanatory variable with out-of-sample case rate data at the state and DMA level. RESULTS Forecasts were highly correlated with confirmed case rates at the state and local level, using search data available up to 10 days in advance of confirmed case rates. They predicted case activity in 49 of 50 states and in 128 of 203 DMA at a significance level of .05 and were robust to differences in regional location, population, and date of outbreak. CONCLUSIONS Identifiable patterns in search query activity may be used to forecast emerging regional outbreaks of COVID-19.
doi:10.2196/preprints.19483 fatcat:jkyoyfckebbijenpqohp4i42ka