Modeling the impact of lifestyle on health at scale

Adam Sadilek, Henry Kautz
2013 Proceedings of the sixth ACM international conference on Web search and data mining - WSDM '13  
Research in computational epidemiology to date has concentrated on estimating summary statistics of populations and simulated scenarios of disease outbreaks. Detailed studies have been limited to small domains, as scaling the methods involved poses considerable challenges. By contrast, we model the associations of a large collection of social and environmental factors with the health of particular individuals. Instead of relying on surveys, we apply scalable machine learning techniques to noisy
more » ... data mined from online social media and infer the health state of any given person in an automated way. We show that the learned patterns can be subsequently leveraged in descriptive as well as predictive fine-grained models of human health. Using a unified statistical model, we quantify the impact of social status, exposure to pollution, interpersonal interactions, and other important lifestyle factors on one's health. Our model explains more than 54% of the variance in people's health (as estimated from their online communication), and predicts the future health status of individuals with 91% accuracy. Our methods complement traditional studies in life sciences, as they enable us to perform large-scale and timely measurement, inference, and prediction of previously elusive factors that affect our everyday lives. Figure 1 : Visualization of the health and location of a sample of Twitter users in New York City. Sick people are colored red, whereas healthy individuals are green. Major pollution sources are highlighted in purple, and ZIP code boundaries are shown with white outlines. This paper explores to what extent online social media can be used to quantify and predict the impact of a large collection of environmental and lifestyle factors on our health. Our web application is available at
doi:10.1145/2433396.2433476 dblp:conf/wsdm/SadilekK13 fatcat:5jbqi4nqeva7bd4s6wcg4jawge