Filters








155 Hits in 6.7 sec

Detecting offensive tweets via topical feature discovery over a large scale twitter corpus

Guang Xiang, Bin Fan, Ling Wang, Jason Hong, Carolyn Rose
2012 Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12  
Our approach exploits linguistic regularities in profane language via statistical topic modeling on a huge Twitter corpus, and detects offensive tweets using these automatically generated features.  ...  In this paper, we propose a novel semi-supervised approach for detecting profanity-related offensive content in Twitter.  ...  of seed profane words) and law-abiding twitterers (i.e., twitterers who rarely use seed offensive words) over a large tweet corpus using a list of pre-defined offensive seed words; we then learn topic  ... 
doi:10.1145/2396761.2398556 dblp:conf/cikm/XiangFWHR12 fatcat:f347mar4tjaflmwctjbhpxj2vi

Towards Measuring Adversarial Twitter Interactions against Candidates in the US Midterm Elections [article]

Yiqing Hua, Thomas Ristenpart, Mor Naaman
2020 arXiv   pre-print
We then develop a new technique for detecting tweets with toxic content that are directed at any specific candidate.Such technique allows us to more accurately quantify adversarial interactions towards  ...  We gather a new dataset consisting of 1.7 million tweets involving candidates, one of the largest corpora focusing on political discourse.  ...  This research is supported by NSF research grants CNS-1704527 and IIS-1665169, as well as a Cornell Tech Digital Life Initiative Doctoral Fellowship.  ... 
arXiv:2005.04411v1 fatcat:xmmqrejb6bcrxm72r4w3rdiicu

EARS (earthquake alert and report system)

Marco Avvenuti, Stefano Cresci, Andrea Marchetti, Carlo Meletti, Maurizio Tesconi
2014 Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14  
Detected events are automatically broadcasted by our system via a dedicated Twitter account and by email notifications.  ...  We then apply a burst detection algorithm in order to promptly identify outbreaking seismic events.  ...  bursts fits well with the need to identify large scale and small scale events.  ... 
doi:10.1145/2623330.2623358 dblp:conf/kdd/AvvenutiCMMT14 fatcat:5rkxhjlx7jdsdn4bshxfrxqmia

Sifting signal from noise: A new perspective on the meaning of tweets about the "big game"

Ian Graves, Nora McDonald, Sean P Goggins
2014 New Media & Society  
A good deal of Twitter research focuses on event-detection using algorithms that rely on key words and tweet density.  ...  Conceptualizing subcontexts as a socio-technical place advances the framing of Twitter event-detection from principally computational to deeply contextual.  ...  Many others have been part of discussions about Twitter data like this over the past 3 years, and we apologize for those omissions.  ... 
doi:10.1177/1461444814541783 fatcat:wdzbhpah4na3hn2jky4jigzd6e

An Information Retrieval Approach to Building Datasets for Hate Speech Detection [article]

Md Mustafizur Rahman, Dinesh Balakrishnan, Dhiraj Murthy, Mucahid Kutlu, Matthew Lease
2021 arXiv   pre-print
We share a new benchmark dataset for hate speech detection on Twitter that provides broader coverage of hate than prior datasets.  ...  Building a benchmark dataset for hate speech detection presents various challenges.  ...  This research was supported in part by Wipro (HELIOS), the Knight Foundation, the Micron Foundation, and Good Systems (https://goodsystems.utexas.edu), a UT Austin Grand Challenge to develop responsible  ... 
arXiv:2106.09775v3 fatcat:56cg2t7nwbfe3lwdqw7z2eqjoy

Sentiment Analysis for Fake News Detection

Miguel A. Alonso, David Vilares, Carlos Gómez-Rodríguez, Jesús Vilares
2021 Electronics  
This has led to sentiment analysis, the part of text analytics in charge of determining the polarity and strength of sentiments expressed in a text, to be used in fake news detection approaches, either  ...  In this article, we study the different uses of sentiment analysis in the detection of fake news, with a discussion of the most relevant elements and shortcomings, and the requirements that should be met  ...  In addition, they considered two features that aimed to capture when the sentiment of a tweet matched the overall sentiment of the topic hypothesizing that tweets that had similar sentiments to the topic  ... 
doi:10.3390/electronics10111348 fatcat:p34nbmtkzrcqrowu24nmu4axnq

A Quantitative Approach to Understanding Online Antisemitism [article]

Savvas Zannettou, Joel Finkelstein, Barry Bradlyn, Jeremy Blackburn
2019 arXiv   pre-print
In this paper, we present a large-scale, quantitative study of online antisemitism.  ...  We extract semantic embeddings from our corpus of posts and demonstrate how automated techniques can discover and categorize the use of antisemitic terminology.  ...  They also assess which features of tweets contribute more on the detection task, finding that character n-grams along with a gender feature provide the best performance. Del Vigna et al.  ... 
arXiv:1809.01644v2 fatcat:wo2jcz7sgvebjp6sngfb3rmtpm

Utilising Wikipedia for Text Mining Applications

Muhammad Atif Qureshi
2016 SIGIR Forum  
category taxonomies • Topical scores corresponding to each tweet obtained via topic modelling • Twitter-specific features obtained using the Twitter API 6 The fundamental constituent of the technique  ...  The Twitter specif ic features show second best performance which confirms the fact that twitter-specific features are important over twitter for sharing information, while Topic specif ic shows the least  ...  same time proposing a technique on top of Wikipedia hyperlink 1 structure to determine context of a tweet.  ... 
doi:10.1145/2888422.2888449 fatcat:lck3kkxoazcj5powaqhjs6epty

Contrastive Learning of Sociopragmatic Meaning in Social Media [article]

Chiyu Zhang, Muhammad Abdul-Mageed, Ganesh Jawahar
2022 arXiv   pre-print
To bridge this gap, we propose a novel framework for learning task-agnostic representations transferable to a wide range of sociopragmatic tasks (e.g., emotion, hate speech, humor, sarcasm).  ...  predictive features for hate speech detection on twitter.  ...  Sarcasm detection on twitter: A behavioral modeling approach.  ... 
arXiv:2203.07648v2 fatcat:6zmhiogvirdlznoaqonyuesc54

Towards Understanding the Information Ecosystem Through the Lens of Multiple Web Communities [article]

Savvas Zannettou
2019 arXiv   pre-print
Then, we follow a data-driven cross-platform quantitative approach to analyze billions of posts from Twitter, Reddit, 4chan's /pol/, and Gab, to shed light on: 1) how news and memes travel from one Web  ...  Our analysis reveal that fringe Web communities like 4chan's /pol/ and The_Donald subreddit have a disproportionate influence on mainstream communities like Twitter with regard to the dissemination of  ...  By extracting a variety of features (user-related, timing-related, content-related and sentiment-related features) from a large corpus of tweets they demonstrate that they can distinguish promoted campaigns  ... 
arXiv:1911.10517v1 fatcat:piuwv7zv7zghlof5tqhuhnukla

Blackmarket-Driven Collusion on Online Media: A Survey

Hridoy Sankar Dutta, Tanmoy Chakraborty
2021 ACM/IMS Transactions on Data Science  
We believe that collusive entity detection is a newly emerging topic in anomaly detection and cyber-security research in general, and the current survey will provide readers with an easy-to-access and  ...  comprehensive list of methods, tools, and resources proposed so far for detecting and analyzing collusive entities on online media.  ...  Twitter is a microblogging service where users write tweets about topics such as politics, sport, cooking, and fashion. Twitter has three types of appraisals: retweets, likes, and followers.  ... 
doi:10.1145/3517931 fatcat:7fvgujegh5hohdiemsok6kzviq

ETHOS: an Online Hate Speech Detection Dataset [article]

Ioannis Mollas, Zoe Chrysopoulou, Stamatis Karlos, Grigorios Tsoumakas
2021 arXiv   pre-print
This phenomenon is primarily fostered by offensive comments, either during user interaction or in the form of a posted multimedia context.  ...  A robust and reliable system for detecting and preventing the uploading of relevant content will have a significant impact on our digitally interconnected society.  ...  The data was gathered again via the Twitter API, filtering tweets containing HS words submitted to Hatebase.org.  ... 
arXiv:2006.08328v2 fatcat:ppg2phh4nber3p42pgbgpyfmrq

An NLP-Powered Human Rights Monitoring Platform

Ayman Alhelbawy, Mark Lattimer, Udo Kruschwitz, Chris Fox, Massimo Poesio
2020 Expert Systems with Applications: X  
organisations that are not of a scale that they can afford their own department dedicated to this task.  ...  Highlights • A practical system for human rights monitoring combining NLP and crowdsourcing • Mining social media offers signals for human rights abuses in addition to reports • Deep learning outperforms  ...  An offensive content detection model was proposed by Chen et al. (2012) to detect 255 offensive language in social media.  ... 
doi:10.1016/j.eswax.2020.100023 fatcat:wuqiko3wr5hkdjo5bkgm4cvrvi

The Origin and Value of Disagreement Among Data Labelers: A Case Study of the Individual Difference in Hate Speech Annotation [article]

Yisi Sang, Jeffrey Stanton
2021 arXiv   pre-print
scale for distilling the process of how annotators label a hate speech corpus.  ...  We tested this scale with 170 annotators in a hate speech annotation task.  ...  Detecting offensive tweets via topical feature discovery over a large scale twitter corpus.  ... 
arXiv:2112.04030v1 fatcat:xtqe55o2c5ambh2jhgofxsjjma

From Symbols to Embeddings: A Tale of Two Representations in Computational Social Science [article]

Huimin Chen, Cheng Yang, Xuanming Zhang, Zhiyuan Liu, Maosong Sun, Jianbin Jin
2021 arXiv   pre-print
However, these large-scale and multi-modal data also present researchers with a great challenge: how to represent data effectively to mine the meanings we want in CSS?  ...  statistics of these applications, we unearth the strength of each kind of representations and discover the tendency that embedding-based representations are emerging and obtaining increasing attention over  ...  and a large corpus of texts.  ... 
arXiv:2106.14198v1 fatcat:dvy5awnfuvbnnkzusjl5wbhfki
« Previous Showing results 1 — 15 out of 155 results