Filters








345 Hits in 5.9 sec

Exploratory Arabic Offensive Language Dataset Analysis [article]

Fatemah Husain, Ozlem Uzuner
2021 arXiv   pre-print
The main goal of this paper is to guide researchers in Arabic offensive language in selecting appropriate datasets based on their content, and in creating new Arabic offensive language resources to support  ...  This paper adding more insights towards resources and datasets used in Arabic offensive language research.  ...  Omar, Mahmoud, and Abd El-Hafeez (2020) release the first multi-platform dataset for Arabic hate speech detection.  ... 
arXiv:2101.11434v1 fatcat:mjh6vbo2jvhn3ibbgzmspnwl4q

Detecting Abusive Language on Online Platforms: A Critical Analysis [article]

Preslav Nakov, Vibha Nayak, Kyle Dent, Ameya Bhatawdekar, Sheikh Muhammad Sarwar, Momchil Hardalov, Yoan Dinkov, Dimitrina Zlatkova, Guillaume Bouchard, Isabelle Augenstein
2021 arXiv   pre-print
We argue that there is currently a dichotomy between what types of abusive language online platforms seek to curb, and what research efforts there are to automatically detect abusive language.  ...  , and to create a more inclusive environment for their users.  ...  Where multilinguality exists, it is presented mostly by European languages. As for Asian languages, datasets exist for Hindi and Indonesian, and six datasets contain Arabic text.  ... 
arXiv:2103.00153v1 fatcat:6757k5kt6fe3fml6prmgdkxf5u

Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model

Wassen Aldjanabi, Abdelghani Dahou, Mohammed A. A. Al-qaness, Mohamed Abd Elaziz, Ahmed Mohamed Helmi, Robertas Damaševičius
2021 Informatics  
More precisely, we develop a classification system for determining offensive and hate speech using a multi-task learning (MTL) model built on top of a pre-trained Arabic language model.  ...  The developed MTL model showed a significant performance and outperformed existing models in the literature on three out of four datasets for Arabic offensive and hate speech detection tasks.  ...  Figure 1 . 1 Examples of offensive and hate speech tweets in Arabic with translation to English. to propose a new Multi-Task Deep Neural Network (MT-DNN).  ... 
doi:10.3390/informatics8040069 fatcat:55a7zksytfgkxkpsvh7jkr4vn4

Hostility Detection Dataset in Hindi [article]

Mohit Bhardwaj, Md Shad Akhtar, Asif Ekbal, Amitava Das, Tanmoy Chakraborty
2020 arXiv   pre-print
In this paper, we present a novel hostility detection dataset in Hindi language. We collect and manually annotate ~8200 online posts.  ...  The annotated dataset covers four hostility dimensions: fake news, hate speech, offensive, and defamation posts, along with a non-hostile label.  ...  Conclusion In this paper, we present the development process of a novel, multi-dimensional hostility detection dataset in Hindi.  ... 
arXiv:2011.03588v1 fatcat:iukjyt7tknavjlqqd6dj5tihh4

A systematic review of Hate Speech automatic detection using Natural Language Processing [article]

Md Saroar Jahan, Mourad Oussalah
2021 arXiv   pre-print
With the multiplication of social media platforms, which offer anonymity, easy access and online community formation, and online debate, the issue of hate speech detection and tracking becomes a growing  ...  Despite efforts for leveraging automatic techniques for automatic detection and monitoring, their performances are still far from satisfactory, which constantly calls for future research on the issue.  ...  The data is comments from the Korean entertainment news aggregation platform.Implementation of paper -"Deep Learning for Hate Speech Detection"HateSonar allows you to detect hate speech and offensive language  ... 
arXiv:2106.00742v1 fatcat:qwxjwgma4zaynemge57cu7xqlm

Coarse and Fine-Grained Hostility Detection in Hindi Posts using Fine Tuned Multilingual Embeddings [article]

Arkadipta De, Venkatesh E, Kaushal Kumar Maurya, Maunendra Sankar Desarkar
2021 arXiv   pre-print
We view this hostility detection as a multi-label multi-class classification problem. We propose an effective neural network-based technique for hostility detection in Hindi posts.  ...  The hostility detection task has been well explored for resource-rich languages like English, but is unexplored for resource-constrained languages like Hindidue to the unavailability of large suitable  ...  «Hostility Detection in Non-English Languages: In [8], the authors address the problem of offensive language detection in the Arabic language using Convolution Neural Network (CNN) and attention-based  ... 
arXiv:2101.04998v1 fatcat:z4bdmqg7mzdlbggjh7am472o2q

Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language [article]

Hala Mulki, Bilal Ghanem
2021 arXiv   pre-print
In this paper, we introduce an Arabic Levantine Twitter dataset for Misogynistic language (LeT-Mi) to be the first benchmark dataset for Arabic misogyny.  ...  Online misogyny has become an increasing worry for Arab women who experience gender-based online abuse on a daily basis.  ...  ., 2017) , two datasets were proposed: a Twitter dataset of 1,100 dialectal tweets and a dataset of 32K inappropriate comments collected from a popular Arabic news site and annotated as obscene, offensive  ... 
arXiv:2103.10195v1 fatcat:cujq6m735zfrrpbj2xblgd5mlu

Cross-lingual Inductive Transfer to Detect Offensive Language [article]

Kartikey Pant, Tanvi Dadu
2020 arXiv   pre-print
In OffensEval 2020, the organizers have released the multilingual Offensive Language Identification Dataset (mOLID), which contains tweets in five different languages, to detect offensive language.  ...  This phenomenon has given rise to the growing need to detect the offensive language used in social media cross-lingually.  ...  Although freedom of speech is often advocated, offensive language in social media is unacceptable. Nevertheless, social media platforms and online communities are laden with offensive comments.  ... 
arXiv:2007.03771v1 fatcat:33ysykzigvfnhbbdvvhgkhbaty

A Systematic Review of Machine Learning Algorithms in Cyberbullying Detection: Future Directions and Challenges

Muhammad Arif
2021 Journal of Information Security and Cybercrimes Research  
There is a growing need for automatic detection and mitigation of cyberbullying events on social media.  ...  Detecting cyberbullying using machine learning and natural language processing algorithms is getting the attention of researchers.  ...  [36] applied different deep learning methods on a dataset consisting of 32000 comments deleted by a news channel as offensive or obscene.  ... 
doi:10.26735/gbtv9013 fatcat:jttkjrbsqndytlmr4yqoik23j4

Multilingual and Multi-Aspect Hate Speech Analysis

Nedjma Ousidhoum, Zizheng Lin, Hongming Zhang, Yangqiu Song, Dit-Yan Yeung
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
In this paper, we present a new multilingual multi-aspect hate speech analysis dataset and use it to test the current state-of-the-art multilingual multitask learning approaches.  ...  We evaluate our dataset in various classification settings, then we discuss how to leverage our annotations in order to improve hate speech detection and classification in general.  ...  For instance, a sexist comment can be disrespectful, hateful, or offensive towards women.  ... 
doi:10.18653/v1/d19-1474 dblp:conf/emnlp/OusidhoumLZSY19 fatcat:molbpoy2fffplie233bzlfktg4

Investigating cross-lingual training for offensive language detection

Andraž Pelicon, Ravi Shekhar, Blaž Škrlj, Matthew Purver, Senja Pollak
2021 PeerJ Computer Science  
Platforms that feature user-generated content (social media, online forums, newspaper comment sections etc.) have to detect and filter offensive speech within large, fast-changing datasets.  ...  In this paper, we investigate the reasons for this performance drop, via a systematic comparison of pre-trained models and intermediate training regimes on five different languages.  ...  This dataset contains reader comments from the Croatian online news media platform 24sata (https://www.24sata. hr/).  ... 
doi:10.7717/peerj-cs.559 fatcat:lsjgz6n7mrgnnlkrwfpddnkhnq

Multilingual and Multi-Aspect Hate Speech Analysis [article]

Nedjma Ousidhoum, Zizheng Lin, Hongming Zhang, Yangqiu Song, Dit-Yan Yeung
2019 arXiv   pre-print
In this paper, we present a new multilingual multi-aspect hate speech analysis dataset and use it to test the current state-of-the-art multilingual multitask learning approaches.  ...  We evaluate our dataset in various classification settings, then we discuss how to leverage our annotations in order to improve hate speech detection and classification in general.  ...  For instance, a sexist comment can be disrespectful, hateful, or offensive towards women.  ... 
arXiv:1908.11049v1 fatcat:6b2evdtujze45kmp56c6e2xyby

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) [article]

Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, Çağrı Çöltekin
2020 arXiv   pre-print
The task featured five languages: English, Arabic, Danish, Greek, and Turkish for Subtask A. In addition, English also featured Subtasks B and C.  ...  We present the results and main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2020).  ...  Acknowledgements This research was partly supported by the IT University of Copenhagen's Abusive Language Detection project.  ... 
arXiv:2006.07235v2 fatcat:gqo4hmya2zcxpkcyzusq4hztlu

Resources and benchmark corpora for hate speech detection: a systematic review

Fabio Poletto, Valerio Basile, Manuela Sanguinetti, Cristina Bosco, Viviana Patti
2020 Language Resources and Evaluation  
Hate Speech in social media is a complex phenomenon, whose detection has recently gained significant traction in the Natural Language Processing community, as attested by several recent review works.  ...  Lexica play an important role as well for the development of hate speech detection systems.  ...  Mubarak et al. (2017) (MDM)-three resources for Arabic language including: a lexicon of 288 obscene words; a test set of 1100 tweets for manual validation; a dataset of 32,000 comments that have been  ... 
doi:10.1007/s10579-020-09502-8 fatcat:sbeb4vujy5hczomyyremxevjiy

SWSR: A Chinese Dataset and Lexicon for Online Sexism Detection [article]

Aiqi Jiang, Xiaohan Yang, Yang Liu, Arkaitz Zubiaga
2021 arXiv   pre-print
Our results show competitive performance, providing a benchmark for sexism detection in the Chinese language, as well as an error analysis highlighting open challenges needing more research in Chinese  ...  While research in the sexism detection domain is growing, most of this research focuses on English as the language and on Twitter as the platform.  ...  Research Applications The SWSR dataset and the SexHateLex lexicon provide resources for furthering research in a new language in the growing research problem of sexist language.  ... 
arXiv:2108.03070v1 fatcat:7pnrrr54xrd6jc6hhacglht43e
« Previous Showing results 1 — 15 out of 345 results