Grid Aramayla Optimize Edilmiş Bayes Lojistik Regresyon Algoritmasının Türkçe Mikro Blog Verilerinde Sanal Zorbalık Tespitinde Kullanılması

Akın Özçift, Deniz Kılınç, Fatma Bozyiğit
2019 Academic Platform Journal of Engineering and Science  
There is a huge interaction between users of various social media platforms. This communication produces enormous amount of user data worth to be analyzed from numerous aspects. One of the research area emerging from the user data is a major security issue known as cyberbullying. Since this problem has been recognized as the source of cybercrimes, design of a system to detect cyberbullying attacks/sources through the micro-blog texts is evident. Most of the academic search of this topic has
more » ... conducted in English language. The originality of this paper is that we develop an accurate cyberbullying detection system for Turkish language. We used data from Twitter to develop a supervised machine learning model on top of Bayesian Logistic Regression whose parameters are tuned with the use of grid-search algorithm. Since the text data produces a high dimensional training space for machine learning algorithms, we also used Chi-Squared (CH2) feature selection strategy to obtain best subset of features. The optimized version of the proposed algorithm on top of reduced feature dimension has produced an f-measure value of 0.925. Finally, we also compared the results of the proposed algorithm with the frequently used machine learning methods from literature and we provided the corresponding results in related sections.
doi:10.21541/apjes.496018 fatcat:feim7gqz6vcitk7h4nfix3x7aq