A year in Madrid as described through the analysis of geotagged Twitter data

Travis R Meyer, Daniel Balagué, Miguel Camacho-Collados, Hao Li, Katie Khuu, P Jeffrey Brantingham, Andrea L Bertozzi
2018 Environment and Planning B Urban Analytics and City Science  
Gaining a complete picture of the activity in a city using vast data sources is challenging yet potentially very valuable. One such source of data is Twitter which generates millions of short spatio-temporally localized messages that, as a collection, have information on city regions and many forms of city activity. The quantity of data, however, necessitates summarization in a way that makes consumption by an observer efficient, accurate, and comprehensive. We present a two-step process for
more » ... lyzing geotagged twitter data within a localized urban environment. The first step involves an efficient form of latent Dirichlet allocation, using an expectation maximization, for topic content summarization of the text information in the tweets. The second step involves spatial and temporal analysis of information within each topic using two complimentary metrics. These proposed metrics characterize the distributional properties of tweets in time and space for all topics. We integrate the second step into a graphical user interface that enables the user to adeptly navigate through the space of hundreds of topics. We present results of a case study of the city of Madrid, Spain, for the year 2011 in which both large-scale protests and elections occurred. Our data analysis methods identify these
doi:10.1177/2399808318764123 fatcat:5ngtuodj4raerewnrk7qik2tdy