A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Blocking Blog Spam with Language Model Disagreement
2005
Adversarial Information Retrieval on the Web
We present an approach for detecting link spam common in blog comments by comparing the language models used in the blog post, the comment, and pages linked by the comments. ...
Preliminary experiments with identification of typical blog spam show promising results. ...
the blog software to block spammers on the fly. ...
dblp:conf/airweb/MishneCL05
fatcat:43rcpwnosvfxhby2ijkzjholoi
Web spam identification through language model analysis
2009
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web - AIRWeb '09
| ( log ) | ( ) || ( 2 1 1 2 1 T t T t p T t p T t p T T KLD Best 05 06 07 050607 Blocking blog spam with language model disagreement. ...
Analyze the relationship between a page and those that point to it Extract topics with LDA or LSI to build new language models Combine language model features with linguistic or new link features Analyze ...
doi:10.1145/1531914.1531920
dblp:conf/airweb/Martinez-RomoA09
fatcat:wo3lhiossjhs3gkbshf3lqkbju
Library blogs and user participation: a survey about comment spam in library blogs
2011
Library hi tech
Research limitations/implications The research focuses on the comment spam problem in blogs affiliated with libraries where the library is responsible for content published on the blog. ...
Findings Regardless of the library type with which blogs were affiliated with and the size of the community they served, user participation in library blogs was very limited in terms of comments left. ...
One model that seems to have success in filtering spam comments in the context of blogs is language model disagreement. ...
doi:10.1108/07378831111116994
fatcat:oetv2l4ecfecrfmpo7k36naeae
Robust detection of comment spam using entropy rate
2012
Proceedings of the 5th ACM workshop on Security and artificial intelligence - AISec '12
In this work, we design a method for blog comment spam detection using the assumption that spam is any kind of uninformative content. ...
The data was provided to us with an initial spam labeling from an industry competitive source. Nevertheless the initial spam labeling had unknown performance characteristics. ...
Mishne et al. study the feasibility of using unigram language models for detecting off-topic link spam blog comments [10] . ...
doi:10.1145/2381896.2381907
dblp:conf/ccs/KantchelianMHAJT12
fatcat:dzdvrvv2vvbudhvgko5qsnheha
Modeling and Data Mining in Blogosphere
2009
Synthesis Lectures on Data Mining and Knowledge Discovery
Spam blogs or splogs is an increasing concern in Blogosphere, which is discussed in detail with the approaches leveraging supervised machine learning algorithms and interaction patterns. ...
To study the complex network such as blogosphere, researchers can develop blog models and generate data through these models while continuously collecting blog data. ...
We dedicate this book to them, with love. ...
doi:10.2200/s00213ed1v01y200907dmk001
fatcat:ifz4ic57sfcwbltrboans35zzm
Web Spam Detection: New Classification Features Based on Qualified Link Analysis and Language Models
2010
IEEE Transactions on Information Forensics and Security
In this paper, we present an efficient spam detection system based on a classifier that combines new link-based features with language-model (LM)-based ones. ...
Index Terms-Content analysis, information retrieval, language models (LMs), link integrity, Web spam detection. ...
Previous works have proved that LM disagreement techniques are very efficient in tasks such as blocking blog spam [18] or detecting nepotistic links [3] . ...
doi:10.1109/tifs.2010.2050767
fatcat:6juorixfive3bfumbhvghae6he
Detecting malicious tweets in trending topics using a statistical analysis of language
2013
Expert systems with applications
of language to detect spam in trending topics. ...
In this paper we present the first work that tries to detect spam tweets in real time using language as the primary tool. ...
Previous works have proved that language model disagreement techniques are very efficient in tasks such as blocking blog spam and detecting nepotistic links and Web spam. ...
doi:10.1016/j.eswa.2012.12.015
fatcat:es6fmurrvjau5nyk7twgplesjm
Autonomous link spam detection in purely collaborative environments
2011
Proceedings of the 7th International Symposium on Wikis and Open Collaboration - WikiSym '11
Such performance could enable egregious link additions to be blocked automatically with low false-positive rates, while prioritizing the remainder for human inspection. ...
For example, low barriersto-entry invite a diversity of spam types, not just those with economic motivations. Moreover, issues can arise with how a link is presented (regardless of the destination). ...
Other research [27] relies on "language model disagreement" -the notion that spam contributions do not fit the "context" of the surrounding content. ...
doi:10.1145/2038558.2038574
dblp:conf/wikis/WestABEL11
fatcat:ctpvhvpuc5dyljvzuarl5z6yu4
Email Spam Detection Using Hierarchical Attention Hybrid Deep Learning Method
[article]
2022
arXiv
pre-print
The suggested technique's findings are compared to those of state-of-the-art models and show that our approach outperforms them. ...
Nevertheless, the rise in email users has occurred a dramatic increase in spam emails in recent years. ...
Experiments on blog spam detection, email spam detection, and splog detection were used to validate their findings. Zhan et al. ...
arXiv:2204.07390v2
fatcat:bqda43jov5ebbmgvcosbxrgatu
Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish
2021
Automatika
representation from unlabelled text, has achieved remarkable results in many natural language processing (NLP) tasks with fine-tuning. ...
Traditionally morphologically difficult languages require dense language pre-processing steps in order to model the data to be suitable for machine learning (ML) algorithms. ...
BERT unravelled the unidirectional limitation with masked language model (MLM). ...
doi:10.1080/00051144.2021.1922150
doaj:9caee9f4f3664cce840352b66516ca08
fatcat:ov6svxkaxfhr3bpg35w7ncdtby
Twitter Bot Detection Using Bidirectional Long Short-term Memory Neural Networks and Word Embeddings
[article]
2020
arXiv
pre-print
Twitter is a web application playing dual roles of online social networking and micro-blogging. ...
To the best of our knowledge, our work is the first that develops a recurrent neural model with word embeddings to distinguish Twitter bots from human accounts, that requires no prior knowledge or assumption ...
It can be learned using a variety of language models. ...
arXiv:2002.01336v1
fatcat:fmqknywbrvgc7j5obqnv3cldeq
The search and social media workshop at SIGIR 2009
2009
SIGIR Forum
What are the needs of users, and models of those needs, specific to social media search? What models make the most sense? How does search interact with existing uses of social media? ...
., examined exploiting user reviews to improve search in wikipedia, followed by Seki who described automatically identifying spam blogs (also known as "splogs"). ...
doi:10.1145/1670564.1670573
fatcat:kr6bl6u4ivanljt7ep6u4axv7y
Posting Bot Detection on Blockchain-based Social Media Platform using Machine Learning Techniques
[article]
2020
arXiv
pre-print
We can extract the features of posts by clustering distances between blog data or replies. ...
Compared with the bot detection on the usual social media platforms, the features we created have an advantage that posting bots can be detected without limiting the number or length of posts. ...
In our labeling process, annotations were rarely proceeded for languages that the annotators were not familiar with. ...
arXiv:2008.12471v1
fatcat:togbjt4r4fhg3f3kng6zfuurbe
GetHealthyHarlem.org: developing a web platform for health promotion and wellness driven by and for the Harlem community
2009
AMIA Annual Symposium Proceedings
The site is gaining active use with more than 9,500 unique site visits in the six months since going live in November, 2008. ...
In ongoing research studies, we are using the website to explore how the PAR model can be applied to the development of a community health website. ...
Drupal is written in the popular programming language PHP and works with several relational databases such as MySQL. ...
pmid:20351872
pmcid:PMC2815482
fatcat:wzxtf672rrcr5o77geop2akj6y
Vlogging
2010
ACM Computing Surveys
By combining the grassroots blogging with the richness of expression available in video, videoblogs (vlogs for short) will be a powerful new media adjunct to our existing televised news sources. ...
In recent years, blogging has become an exploding passion among Internet communities. ...
This divergence in the language models can be exploited to effectively classify comments as spam or nonspam. ...
doi:10.1145/1749603.1749606
fatcat:qeopjzg4j5bwparwtbmvwdesd4
« Previous
Showing results 1 — 15 out of 533 results