Exploiting Geo-tagged Tweets to Understand Localized Language Diversity

Amr Magdy, Thanaa M. Ghanem, Mashaal Musleh, Mohamed F. Mokbel
2007 Proceedings of Workshop on Managing and Mining Enriched Geo-Spatial Data - GeoRich'14  
Social media services are the top-growing online communities in the last few years. Among those, Twitter becomes the de facto of microblogging services with millions of tweets posted everyday. In this paper, we present an analytical study for localized language usage and diversity in Twitter data using a half billion geotagged tweets. We first identify local Twitter communities on a country-level. For the identified communities, we examine (1) the language diversity, (2) the language dominance
more » ... ithin the community and how this differs from local to global views, (3) demographics representativeness of tweets for real population demographics, and (4) the spatial distribution of different cultural groups within the countries. To this end, we group the tweets on two levels. First, we group tweets per country to identify the local communities. Second, we group tweets within each local community based on the tweet language. Our study shows useful insights about language usage on Twitter which provide important information for language-based applications on top of Twitter data, e.g., lingual analysis and disaster management. In addition, we present an interactive exploration tool for the spatial distribution of cultural groups, which provides a low-effort and high-precision localization of different cultural groups inside a certain country.
doi:10.1145/2619112.2619114 dblp:conf/sigmod/0001GMM14 fatcat:a6dcwvk2yvcrfp65mlkiglb2ou