Tweet Geolocation

Wen-Haw Chong, Ee-Peng Lim
2017 Proceedings of the 2017 ACM on Conference on Information and Knowledge Management - CIKM '17  
Which venue is a tweet posted from? We referred this as negrained geolocation. To solve this problem e ectively, we develop novel techniques to exploit each posting user's content history. is is motivated by our nding that most users do not share their visitation history, but have ample content history from tweet posts. We formulate ne-grained geolocation as a ranking problem whereby given a test tweet, we rank candidate venues. We propose several models that leverage on three types of signals
more » ... e types of signals from locations, users and peers. Firstly, the location signals are words that are indicative of venues. We propose a location-indicative weighting scheme to capture this. Next we exploit user signals from each user's content history to enrich the very limited content of their tweets which have been targeted for geolocation. e intuition is that the user's other tweets may have been from the test venue or related venues, thus providing informative words. In this regard, we propose query expansion as the enrichment approach. Finally, we exploit the signals from peer users who have similar content history and thus potentially similar visitation behavior as the users of the test tweets. is suggests collaborative ltering where visitation information is propagated via content similarities. We proposed several models incorporating di erent combinations of the three signals. Our experiments show that the best model incorporates all three signals. It performs 6% to 40% be er than the baselines depending on the metric and dataset.
doi:10.1145/3132847.3132906 dblp:conf/cikm/ChongL17 fatcat:r57h24vsanejvdf4vwsryhfkqi