Filters








7,814 Hits in 7.2 sec

Detection of Abusive Language: the Problem of Biased Datasets

Michael Wiegand, Josef Ruppenhofer, Thomas Kleinbauer
2019 North American Chapter of the Association for Computational Linguistics  
We discuss the impact of data bias on abusive language detection.  ...  Datasets with a higher proportion of implicit abuse are more affected than datasets with a lower proportion.  ...  Acknowledgements The authors were partially supported by the German Research Foundation (DFG) under grants RU 1873/2-1 and WI 4204/2-1. References  ... 
doi:10.18653/v1/n19-1060 dblp:conf/naacl/WiegandRK19 fatcat:bbmpyatp7rfipoo6piipwnnwpy

Cross-Domain Detection of Abusive Language Online

Mladen Karan, Jan Šnajder
2018 Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)  
We investigate to what extent the models trained to detect general abusive language generalize between different datasets labeled with different abusive language types.  ...  To this end, we compare the cross-domain performance of simple classification models on nine different datasets, finding that the models fail to generalize to out-domain datasets and that having at least  ...  Detecting abusive language online is a subject of much ongoing research in the NLP community.  ... 
doi:10.18653/v1/w18-5117 dblp:conf/acl-alw/KaranS18 fatcat:xd2evaqe6rembc24ajg3hwx6ki

Joint Modelling of Emotion and Abusive Language Detection [article]

Santhosh Rajamanickam, Pushkar Mishra, Helen Yannakoudakis, Ekaterina Shutova
2020 arXiv   pre-print
Aiming to tackle this problem, the natural language processing (NLP) community has experimented with a range of techniques for abuse detection.  ...  In this paper, we present the first joint model of emotion and abusive language detection, experimenting in a multi-task learning framework that allows one task to inform the other.  ...  This stresses the need for automated techniques for abusive language detection, a problem that has recently gained a great deal of interest in the natural language processing community.  ... 
arXiv:2005.14028v1 fatcat:nnltfnth4fb57npktge3gwu5xe

Abusive Language Detection and Characterization of Twitter Behavior [article]

Dincy Davis, Reena Murali, Remesh Babu
2020 arXiv   pre-print
Here the main objective is to focus on various forms of abusive behaviors on Twitter and to detect whether a speech is abusive or not.  ...  In this work, abusive language detection in online content is performed using Bidirectional Recurrent Neural Network (BiRNN) method.  ...  Example of Abusive language A large number of studies has been done in recent years to develop automatic methods for the detection of abusive languages in social media platforms.  ... 
arXiv:2009.14261v1 fatcat:tigf4y7tcfgi5evboqp55cwqh4

Reducing Gender Bias in Abusive Language Detection

Ji Ho Park, Jamin Shin, Pascale Fung
2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing  
Abusive language detection models tend to have a problem of being biased toward identity words of a certain group of people because of imbalanced training datasets.  ...  In this work, we measure gender biases on models trained with different abusive language datasets, while analyzing the effect of different pre-trained word embeddings and model architectures.  ...  Acknowledgments This work is partially funded by ITS/319/16FP of Innovation Technology Commission, HKUST, and 16248016 of Hong Kong Research Grants Council.  ... 
doi:10.18653/v1/d18-1302 dblp:conf/emnlp/ParkSF18 fatcat:ybtesdtm2fejxb2fevt7du4td4

Studying Generalisability across Abusive Language Detection Datasets

Steve Durairaj Swamy, Anupam Jamatia, Björn Gambäck
2019 Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)  
Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets.  ...  Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental  ...  Acknowledgments Thanks to all the researchers who have made their datasets available, specially Waseem and Hovy, Davidson et al., Founta et al., and Zampieri et al., the organisers of SemEval-2019 Task  ... 
doi:10.18653/v1/k19-1088 dblp:conf/conll/SwamyJG19 fatcat:mn6slro7dfhybh2gl4j7zbmrpy

Neural Character-based Composition Models for Abuse Detection

Pushkar Mishra, Helen Yannakoudakis, Ekaterina Shutova
2018 Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)  
Acknowledgements Special thanks to the anonymous reviewers for their valuable comments and suggestions.  ...  Conclusions In this paper, we considered the problem of obfuscated words in the field of automated abuse detection.  ...  Datasets Following the proceedings of the 1 st Workshop on Abusive Language Online (Waseem et al., 2017) , we use three datasets from two different domains.  ... 
doi:10.18653/v1/w18-5101 dblp:conf/acl-alw/MishraYS18 fatcat:y7mbghaq6japjkuojozdhwhtjm

Multi-Class Detection of Abusive Language Using Automated Machine Learning [chapter]

Mackenzie Jorgensen, Villanova University, Dept. of Computing Sciences, Villanova, USA, Minho Choi, Marco Niemann, Jens Brunk, Jörg Becker, Lewis & Clark College, Dept. of Mathematical Sciences, Portland, USA, University of Münster – ERCIS, Münster, Germany
2020 WI2020 Zentrale Tracks  
We propose Auto-ML as a promising approach to the field of abusive language detection, especially for small companies who may have little machine learning knowledge and computing resources.  ...  Abusive language detection online is a daunting task for moderators. We propose Automated Machine Learning (Auto-ML) to semi-automate abusive language detection and to assist moderators.  ...  Acknowledgements The research leading to these results received funding from the federal state of North Rhine-Westphalia and the European Regional Development Fund (EFRE.NRW 2014-2020), Project: (No.  ... 
doi:10.30844/wi_2020_r7-jorgensen dblp:conf/wirtschaftsinformatik/JorgensenCNB020 fatcat:unwwznyihfazvdxgfign2ktjua

Abusive Language Detection in Heterogeneous Contexts: Dataset Collection and the Role of Supervised Attention [article]

Hongyu Gong, Alberto Valido, Katherine M. Ingram, Giulia Fanti, Suma Bhat, Dorothy L. Espelage
2021 arXiv   pre-print
This is due in part to the lack of datasets that explicitly annotate heterogeneity in abusive language.  ...  Abusive language is a massive problem in online social platforms.  ...  Acknowledgements This work was supported in part by the National Science Foundation under grant no. 1720268. We would like to thank  ... 
arXiv:2105.11119v1 fatcat:2u2ltkykzvh37gyj5tlk7nthkm

On Cross-Dataset Generalization in Automatic Detection of Online Abuse [article]

Isar Nejadgholi, Svetlana Kiritchenko
2021 arXiv   pre-print
NLP research has attained high performances in abusive language detection as a supervised classification task.  ...  We explore the topic bias and the task formulation bias in cross-dataset generalization. We show that the benign examples in the Wikipedia Detox dataset are biased towards platform-specific topics.  ...  Detection of Abusive Language: the Problem of Biased Datasets.  ... 
arXiv:2010.07414v3 fatcat:tr6njwf2nzbvvl4a35an7eaeji

Aggression Detection on Social Media Text Using Deep Neural Networks

Vinay Singh, Aman Varshney, Syed Sarfaraz Akhtar, Deepanshu Vijay, Manish Shrivastava
2018 Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)  
In this paper, we introduce a deep learning based classification system for Facebook posts and comments of Hindi-English Code-Mixed text to detect the aggressive behaviour of/towards users.  ...  Our work focuses on text from users majorly in the Indian Subcontinent. The dataset that we used for our models is provided by TRAC-1 1 in their shared task.  ...  These network has been used in the past for tasks similar to our task like hate speech detection (Badjatiya et al., 2017) , bullying detection (Agrawal and Awekar, 2018) , Abusive language detection  ... 
doi:10.18653/v1/w18-5106 dblp:conf/acl-alw/SinghVAVS18 fatcat:7ymguyz25bfzzdb5spld5p45xm

WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection [article]

Noé Cecillon , Georges Linares
2020 arXiv   pre-print
We also propose, in addition to this corpus, a complete benchmarking platform to stimulate and fairly compare scientific works around the problem of content abuse detection, trying to avoid the recurring  ...  problem of result replication.  ...  By comparison, in the abuse detection literature, datasets are often annotated by considering comments flagged by moderators as abusive, whereas the rest of the comments are deemed non-abusive by default  ... 
arXiv:2003.06190v1 fatcat:2fr4mzluerbythqbpnp4plxncq

Reducing Gender Bias in Abusive Language Detection [article]

Ji Ho Park, Jamin Shin, Pascale Fung
2018 arXiv   pre-print
Abusive language detection models tend to have a problem of being biased toward identity words of a certain group of people because of imbalanced training datasets.  ...  In this work, we measure gender biases on models trained with different abusive language datasets, while analyzing the effect of different pre-trained word embeddings and model architectures.  ...  Acknowledgments This work is partially funded by ITS/319/16FP of Innovation Technology Commission, HKUST, and 16248016 of Hong Kong Research Grants Council.  ... 
arXiv:1808.07231v1 fatcat:irtywc5hkneyvacueh4ovnvfqa

Detecting Recovery Problems Just in Time: Application of Automated Linguistic Analysis and Supervised Machine Learning to an Online Substance Abuse Forum

Rachel Kornfield, Prathusha K Sarma, Dhavan V Shah, Fiona McTavish, Gina Landucci, Klaren Pe-Romashko, David H Gustafson
2018 Journal of Medical Internet Research  
Results: To distinguish recovery problem disclosures, the Bag-of-Words approach relied on domain-specific language, including words explicitly linked to substance use and mental health ("drink," "relapse  ...  Conclusions: Differences in language use can distinguish messages disclosing recovery problems from other message types.  ...  Acknowledgments This research was funded by the National Institute of Alcohol Abuse and Alcoholism (R01 AA017192) and the National Institute on Drug Abuse (R01DA034279, R01DA040449, and DP2DA042424).  ... 
doi:10.2196/10136 pmid:29895517 pmcid:PMC6019846 fatcat:fsu37llrdfbwxphzycmh4sj7xu

Directions in Abusive Language Training Data: Garbage In, Garbage Out [article]

Bertie Vidgen, Leon Derczynski
2020 arXiv   pre-print
This paper systematically reviews abusive language dataset creation and content in conjunction with an open website for cataloguing abusive language data.  ...  Data-driven analysis and detection of abusive online content covers many different tasks, phenomena, contexts, and methodologies.  ...  Creating a training dataset for online abuse detection is typically motivated by the desire to address a particular social problem.  ... 
arXiv:2004.01670v2 fatcat:vj5mxajmsbbtfk4e7u2iyiacmy
« Previous Showing results 1 — 15 out of 7,814 results