Soda Pop: A Time-Series Clustering, Alarming and Disease Forecasting Application

Jeremiah Rounds, Lauren Charles-Smith, Courtney D. Corley
2017 Online Journal of Public Health Informatics  
ObjectiveTo introduce Soda Pop, an R/Shiny application designed to be adisease agnostic time-series clustering, alarming, and forecastingtool to assist in disease surveillance "triage, analysis and reporting"workflows within the Biosurveillance Ecosystem (BSVE) [1]. In thisposter, we highlight the new capabilities that are brought to the BSVEby Soda Pop with an emphasis on the impact of metholodogicaldecisions.IntroductionThe Biosurveillance Ecosystem (BSVE) is a biological andchemical threat
more » ... rveillance system sponsored by the Defense ThreatReduction Agency (DTRA). BSVE is intended to be user-friendly,multi-agency, cooperative, modular and threat agnostic platformfor biosurveillance [2]. In BSVE, a web-based workbench presentsthe analyst with applications (apps) developed by various DTRAfundedresearchers, which are deployed on-demand in the cloud(e.g., Amazon Web Services). These apps aim to address emergingneeds and refine capabilities to enable early warning of chemical andbiological threats for multiple users across local, state, and federalagencies.Soda Pop is an app developed by Pacific Northwest NationalLaboratory (PNNL) to meet the current needs of the BSVE forearly warning and detection of disease outbreaks. Aimed for use bya diverse set of analysts, the application is agnostic to data sourceand spatial scale enabling it to be generalizable across many diseasesand locations. To achieve this, we placed a particular emphasis onclustering and alerting of disease signals within Soda Pop withoutstrong prior assumptions on the nature of observed diseased counts.MethodsAlthough designed to be agnostic to the data source, Soda Pop wasinitially developed and tested on data summarizing Influenza-LikeIllness in military hospitals from collaboration with the Armed ForcesHealth Surveillance Branch. Currently, the data incorporated alsoincludes the CDC's National Notifiable Diseases Surveillance System(NNDSS) tables [3] and the WHO's Influenza A/B Influenza Data(Flunet) [4]. These data sources are now present in BSVE's Postgresdata storage for direct access.Soda Pop is designed to automate time-series tasks of datasummarization, exploration, clustering, alarming and forecasting.Built as an R/Shiny application, Soda Pop is founded on the powerfulstatistical tool R [5]. Where applicable, Soda Pop facilitates nonparametricseasonal decomposition of time-series; hierarchicalagglomerative clustering across reporting areas and between diseaseswithin reporting areas; and a variety of alarming techniques includingExponential Weighted Moving Average alarms and Early AberrationDetection [6].Soda Pop embeds these techniques within a user-interface designedto enhance an analyst's understanding of emerging trends in their dataand enables the inclusion of its graphical elements into their dossierfor further tracking and reporting. The ultimate goal of this softwareis to facilitate the discovery of unknown disease signals along withincreasing the speed of detection of unusual patterns within thesesignals.ConclusionsSoda Pop organizes common statistical disease surveillance tasksin a manner integrated with BSVE data source inputs and outputs.The app analyzes time-series disease data and supports a robust set ofclustering and alarming routines that avoid strong assumptions on thenature of observed disease counts. This attribute allows for flexibilityin the data source, spatial scale, and disease types making it useful toa wide range of analystsSoda Pop within the BSVE.KeywordsBSVE; Biosurveillance; R/Shiny; Clustering; AlarmingAcknowledgmentsThis work was supported by the Defense Threat Reduction Agency undercontract CB10082 with Pacific Northwest National LaboratoryReferences1. Dasey, Timothy, et al. "Biosurveillance Ecosystem (BSVE) WorkflowAnalysis." Online journal of public health informatics 5.1 (2013).2. Accessed 9/6/2016.3. Centers for Disease Control and Prevention. "National NotifiableDiseases Surveillance System (NNDSS)."4. World Health Organization. "FluNet." Global Influenza Surveillanceand Response System (GISRS).5. R Core Team (2016). R: A language and environment for statisticalcomputing. R Foundation for Statistical Computing, Vienna, Austria.6. Salmon, Maëlle, et al. "Monitoring Count Time Series in R: AberrationDetection in Public Health Surveillance." Journal of StatisticalSoftware [Online], 70.10 (2016): 1 - 35.
doi:10.5210/ojphi.v9i1.7582 fatcat:4cyb7ucqqra6beuwytmu2lb64u