Temporal Topic Modeling to Assess Associations between News Trends and Infectious Disease Outbreaks

Saurav Ghosh, Prithwish Chakraborty, Elaine O. Nsoesie, Emily Cohn, Sumiko R. Mekaru, John S. Brownstein, Naren Ramakrishnan
2017 Scientific Reports  
In retrospective assessments, internet news reports have been shown to capture early reports of unknown infectious disease transmission prior to official laboratory confirmation. In general, media interest and reporting peaks and wanes during the course of an outbreak. In this study, we quantify the extent to which media interest during infectious disease outbreaks is indicative of trends of reported incidence. We introduce an approach that uses supervised temporal topic models to transform
more » ... e corpora of news articles into temporal topic trends. The key advantages of this approach include: applicability to a wide range of diseases and ability to capture disease dynamics, including seasonality, abrupt peaks and troughs. We evaluated the method using data from multiple infectious disease outbreaks reported in the United States of America (U.S.), China, and India. We demonstrate that temporal topic trends extracted from disease-related news reports successfully capture the dynamics of multiple outbreaks such as whooping cough in U.S. (2012), dengue outbreaks in India (2013) and China (2014). Our observations also suggest that, when news coverage is uniform, efficient modeling of temporal topic trends using time-series regression techniques can estimate disease case counts with increased precision before official reports by health organizations. Infectious diseases are a threat to public health and economic stability of many countries. Open source indicators (e.g., news articles 1,2 , blogs 3 , search engine query volume 4-7 , social media chatter 8-11 and other sources 12 ) are an attractive option for monitoring infectious disease progression, primarily due to their sheer volume and capacity to capture early signals of disease outbreaks, and in some cases, trends in population health-seeking behavior. However, most prior work in digital surveillance using open source indicators has targeted specific diseases, such as influenza 12,13 and hantavirus pulmonary syndrome (HPS) 14 . Therefore, there is a need to develop generic frameworks that are applicable to multiple infectious diseases. Official surveillance reports released by health organizations (e.g., CDC, WHO, PAHO) are published with a considerable delay of weeks, months or even a year. Therefore, traditional surveillance systems are not always effective at real-time monitoring of emerging public health threats. Unlike traditional surveillance data, informal digital sources, such as news media, blogs, and micro-blogging sites (Twitter) are typically available in (near) real-time. Proper mining of signals from these digital sources can effectively help in minimizing the time lag between an outbreak start and formal recognition of an outbreak, allowing for an accelerated response to public health threats. The gains in supplementing traditional surveillance using digital sources have been discussed in Nsoesie et al. 15 , Salathé et al. 16,17 and Hartley et al. 18 . Our key contributions are as follows. (i) We introduce EpiNews, a generic temporal framework for analyzing disease-related news reports using a supervised topic model. The supervised topic model discovers multiple disease topics of interest and their associated temporal trends of prominence in news media. (ii) EpiNews captures trends in disease progression, such as periodicity, peaks and troughs via temporal trends of disease topics in news media. (iii) When news coverage is adequate, EpiNews also estimates disease incidence before official reports by health agencies using time-series regression models interposed over the temporal trends of disease topics.
doi:10.1038/srep40841 pmid:28102319 pmcid:PMC5244405 fatcat:pcz5brdoknhzlhmrnf6diosyei