Identifying Methods for Monitoring Foodborne Illness: Review of Existing Public Health Surveillance Techniques
JMIR Public Health and Surveillance
Traditional methods of monitoring foodborne illness are associated with problems of untimeliness and underreporting. In recent years, alternative data sources such as social media data have been used to monitor the incidence of disease in the population (infodemiology and infoveillance). These data sources prove timelier than traditional general practitioner data, they can help to fill the gaps in the reporting process, and they often include additional metadata that is useful for supplementary
... research. Objective: The aim of the study was to identify and formally analyze research papers using consumer-generated data, such as social media data or restaurant reviews, to quantify a disease or public health ailment. Studies of this nature are scarce within the food safety domain, therefore identification and understanding of transferrable methods in other health-related fields are of particular interest. Methods: Structured scoping methods were used to identify and analyze primary research papers using consumer-generated data for disease or public health surveillance. The title, abstract, and keyword fields of 5 databases were searched using predetermined search terms. A total of 5239 papers matched the search criteria, of which 145 were taken to full-text review-62 papers were deemed relevant and were subjected to data characterization and thematic analysis. Results: The majority of studies (40/62, 65%) focused on the surveillance of influenza-like illness. Only 10 studies (16%) used consumer-generated data to monitor outbreaks of foodborne illness. Twitter data (58/62, 94%) and Yelp reviews (3/62, 5%) were the most commonly used data sources. Studies reporting high correlations against baseline statistics used advanced statistical and computational approaches to calculate the incidence of disease. These include classification and regression approaches, clustering approaches, and lexicon-based approaches. Although they are computationally intensive due to the requirement of training data, studies using classification approaches reported the best performance. Conclusions: By analyzing studies in digital epidemiology, computer science, and public health, this paper has identified and analyzed methods of disease monitoring that can be transferred to foodborne disease surveillance. These methods fall into 4 main categories: basic approach, classification and regression, clustering approaches, and lexicon-based approaches. Although studies using a basic approach to calculate disease incidence generally report good performance against baseline measures, they are sensitive to chatter generated by media reports. More computationally advanced approaches are required to filter spurious messages and protect predictive systems against false alarms. Research using consumer-generated data for monitoring influenza-like illness is expansive; however, research regarding the use of restaurant reviews and social media data in the context of food safety is limited. Considering the advantages reported in this review, methods using consumer-generated data for foodborne disease surveillance warrant further investment.