Coalmine: an experience in building a system for social media analytics

Joshua S. White, Jeanna N. Matthews, John L. Stacy, Igor V. Ternovskiy, Peter Chin
2012 Cyber Sensing 2012  
Social media networks make up a large percentage of the content available on the Internet and most of the time users spend online today is in interacting with them. All of the seemingly small pieces of information added by billions of people result in a enormous rapidly changing dataset. Searching, correlating, and understanding billions of individual posts is a significant technical problem; even the data from a single site such as Twitter can be difficult to manage. In this paper, we present
more » ... oalmine a social network data-mining system. We describe the overall architecture of Coalmine including the capture, storage and search components. We also describe our experience with pulling 150-350 GB of Twitter data per day through their REST API. Specifically, we discuss our experience with the evolution of the Twitter data APIs from 2011 to 2012 and present strategies for maximizing the amount of data collected. Finally, we describe our experiences looking for evidence of botnet command and control channels and examining patterns of SPAM in the Twitter dataset.
doi:10.1117/12.918933 fatcat:uol23by5p5hyzg7z27c3g6e3di