Advanced Characteristic Analysis of Real Time Junk Occurrences in Twitter

Ancy S, Aruna Jasmine.J
2018 International Journal of Computer Applications Technology and Research  
Spam on twitter is a major threat in recent days. To overcome these problems we take many steps to work on this. This work uses twitter as the input data source to address the problem of real-time. As twitter data contains a lot of spam, we built a dictionary of words to remove spam from the tweet social media. In order to solve these problem, we firstly carry out a deep analysis on the statistical features of taking training sets of data to differentiate spam tweet and non-spam tweet. Then we
more » ... ropose a approach called "NLTK(Natural Language Tool Kit). The proposed approach can discover "changed" spam posts from unlabeled posts and incorporate them into classifier's training process. To evaluate the proposed scheme many experiments were carried out. The results show that our proposed NLTK can remarkably improve the spam detection accuracy in real-world scenario
doi:10.7753/ijcatr0712.1001 fatcat:3aknsycxardtbdvp2umy6wdja4