Identification of HATE speech tweets in Pashto language using Machine Learning techniques

2021 International Journal of Advanced Trends in Computer Science and Engineering  
From the last few years, researchers are very much attracted to sentiment analysis, especially towards hate speech detectionsystems. As in different languages procreation of hate speech has compelling and symbolic consideration on social media. Hate speech has a great impact on society, using hate words harms others dignity. Hate speech detectionsystems areimportant to stop the transformation of hate words into crimes. In this research,a frameworkis developedfor hate speech detectionsystemin
more » ... Pashto language. A datasetis created for which data is collected from Twitter. Because there is no related data available. Most of the research work has been done in this domain for other languages, and it's very maturein the context of detecting hate speech. But when it arrives at the morphological languages not much work has been done especially in the Pashto language. This researchaimed and collected data from Twitter, Tweets related to ethnicity and religion. The data collected from twitter has been annotated manually and categorized the data as hate or not by comparing it with the offensive content. For hate speechdetection systemsto view the impact of different features/attribute this study performed experiments on the existing classifiers i.e.,SVM, Naïve Bayes, Decision tree and KNN. SVM produced the highest result at dataset of 500 i.e.,74% among all the classifiers. KNN and Decision Tree produced same result at dataset of 1500 i.e.,65.0%. Dataset of 2800 Decision Tree produced the highest result i.e.,72% and SVM produced 71.9%.
doi:10.30534/ijatcse/2021/021032021 fatcat:upkdqfspu5agzk6d73asmd3yqi