Topic modeling of freelance job postings to monitor web service abuse

Do-kyum Kim, Marti Motoyama, Geoffrey M. Voelker, Lawrence K. Saul
2011 Proceedings of the 4th ACM workshop on Security and artificial intelligence - AISec '11  
Web services such as Google, Facebook, and Twitter are recurring victims of abuse, and their plight will only worsen as more attackers are drawn to their large user bases. Many attackers hire cheap, human labor to actualize their schemes, connecting with potential workers via crowdsourcing and freelancing sites such as Mechanical Turk and Freelancer.com. To identify solicitations for abuse jobs, these Web sites need ways to distinguish these tasks from ordinary jobs. In this paper, we show how
more » ... o discover clusters of abuse tasks using latent Dirichlet allocation (LDA), an unsupervised method for topic modeling in large corpora of text. Applying LDA to hundreds of thousands of unlabeled job postings from Freelancer.com, we find that it discovers clusters of related abuse jobs and identifies the prevalent words that distinguish them. Finally, we use the clusters from LDA to profile the population of workers who bid on abuse jobs and the population of buyers who post their project descriptions.
doi:10.1145/2046684.2046687 dblp:conf/ccs/KimMVS11 fatcat:qiqae4rwjjenvjho2mjy6dvxny