Multivariate scan statistics for disease surveillance

Martin Kulldorff, Farzad Mostashari, Luiz Duczmal, W. Katherine Yih, Ken Kleinman, Richard Platt
2007 Statistics in Medicine  
In disease surveillance, there are often many different data sets or data groupings for which we wish to do surveillance. If each data set is analyzed separately rather than combined, the statistical power to detect an outbreak that is present in all data sets may suffer due to low numbers in each. On the other hand, if the data sets are added by taking the sum of the counts, then a signal that is primarily present in one data set may be hidden due to random noise in the other data sets. In
more » ... paper, we present an extention of the spatial and space-time scan statistic that simultaneously incorporates multiple data sets into a single likelihood function, so that a signal is generated whether it occurs in only one or in multiple data sets. This is done by defining the combined log likelihood as the sum of the individual log likelihoods for those data sets for which the observed case count is more than the expected. Using data from the National Bioterrorism Syndromic Surveillance Demonstration Project, we illustrate the new method using physician telephone calls, regular physician visits and urgent care visits by Harvard Pilgrim Health Care members cared for by Harvard Vanguard Medical Associates, a large multi-specialty group practice in Massachusetts. For upper and lower gastrointestinal illness, there were on . average 20 telephone calls, 9 urgent care visits and 22 regular physician visits per day. The strongest signal was generated by a single data set and due to a familial outbreak of pinworm disease. The second and third strongest signals were generated by the combined strength of two of the three data sets.
doi:10.1002/sim.2818 pmid:17216592 fatcat:dcd3ycegjrazja4hgslrtaybnq