Filters








533 Hits in 5.1 sec

Blocking Blog Spam with Language Model Disagreement

Gilad Mishne, David Carmel, Ronny Lempel
2005 Adversarial Information Retrieval on the Web  
We present an approach for detecting link spam common in blog comments by comparing the language models used in the blog post, the comment, and pages linked by the comments.  ...  Preliminary experiments with identification of typical blog spam show promising results.  ...  the blog software to block spammers on the fly.  ... 
dblp:conf/airweb/MishneCL05 fatcat:43rcpwnosvfxhby2ijkzjholoi

Web spam identification through language model analysis

Juan Martinez-Romo, Lourdes Araujo
2009 Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web - AIRWeb '09  
| ( log ) | ( ) || ( 2 1 1 2 1 T t T t p T t p T t p T T KLD Best  05  06  07  050607 Blocking blog spam with language model disagreement.  ...  Analyze the relationship between a page and those that point to it Extract topics with LDA or LSI to build new language models Combine language model features with linguistic or new link features Analyze  ... 
doi:10.1145/1531914.1531920 dblp:conf/airweb/Martinez-RomoA09 fatcat:wo3lhiossjhs3gkbshf3lqkbju

Library blogs and user participation: a survey about comment spam in library blogs

Fatih Oguz, Michael Holt
2011 Library hi tech  
Research limitations/implications The research focuses on the comment spam problem in blogs affiliated with libraries where the library is responsible for content published on the blog.  ...  Findings Regardless of the library type with which blogs were affiliated with and the size of the community they served, user participation in library blogs was very limited in terms of comments left.  ...  One model that seems to have success in filtering spam comments in the context of blogs is language model disagreement.  ... 
doi:10.1108/07378831111116994 fatcat:oetv2l4ecfecrfmpo7k36naeae

Robust detection of comment spam using entropy rate

Alex Kantchelian, Justin Ma, Ling Huang, Sadia Afroz, Anthony Joseph, J. D. Tygar
2012 Proceedings of the 5th ACM workshop on Security and artificial intelligence - AISec '12  
In this work, we design a method for blog comment spam detection using the assumption that spam is any kind of uninformative content.  ...  The data was provided to us with an initial spam labeling from an industry competitive source. Nevertheless the initial spam labeling had unknown performance characteristics.  ...  Mishne et al. study the feasibility of using unigram language models for detecting off-topic link spam blog comments [10] .  ... 
doi:10.1145/2381896.2381907 dblp:conf/ccs/KantchelianMHAJT12 fatcat:dzdvrvv2vvbudhvgko5qsnheha

Modeling and Data Mining in Blogosphere

Nitin Agarwal, Huan Liu
2009 Synthesis Lectures on Data Mining and Knowledge Discovery  
Spam blogs or splogs is an increasing concern in Blogosphere, which is discussed in detail with the approaches leveraging supervised machine learning algorithms and interaction patterns.  ...  To study the complex network such as blogosphere, researchers can develop blog models and generate data through these models while continuously collecting blog data.  ...  We dedicate this book to them, with love.  ... 
doi:10.2200/s00213ed1v01y200907dmk001 fatcat:ifz4ic57sfcwbltrboans35zzm

Web Spam Detection: New Classification Features Based on Qualified Link Analysis and Language Models

Lourdes Araujo, Juan Martinez-Romo
2010 IEEE Transactions on Information Forensics and Security  
In this paper, we present an efficient spam detection system based on a classifier that combines new link-based features with language-model (LM)-based ones.  ...  Index Terms-Content analysis, information retrieval, language models (LMs), link integrity, Web spam detection.  ...  Previous works have proved that LM disagreement techniques are very efficient in tasks such as blocking blog spam [18] or detecting nepotistic links [3] .  ... 
doi:10.1109/tifs.2010.2050767 fatcat:6juorixfive3bfumbhvghae6he

Detecting malicious tweets in trending topics using a statistical analysis of language

Juan Martinez-Romo, Lourdes Araujo
2013 Expert systems with applications  
of language to detect spam in trending topics.  ...  In this paper we present the first work that tries to detect spam tweets in real time using language as the primary tool.  ...  Previous works have proved that language model disagreement techniques are very efficient in tasks such as blocking blog spam and detecting nepotistic links and Web spam.  ... 
doi:10.1016/j.eswa.2012.12.015 fatcat:es6fmurrvjau5nyk7twgplesjm

Autonomous link spam detection in purely collaborative environments

Andrew G. West, Avantika Agrawal, Phillip Baker, Brittney Exline, Insup Lee
2011 Proceedings of the 7th International Symposium on Wikis and Open Collaboration - WikiSym '11  
Such performance could enable egregious link additions to be blocked automatically with low false-positive rates, while prioritizing the remainder for human inspection.  ...  For example, low barriersto-entry invite a diversity of spam types, not just those with economic motivations. Moreover, issues can arise with how a link is presented (regardless of the destination).  ...  Other research [27] relies on "language model disagreement" -the notion that spam contributions do not fit the "context" of the surrounding content.  ... 
doi:10.1145/2038558.2038574 dblp:conf/wikis/WestABEL11 fatcat:ctpvhvpuc5dyljvzuarl5z6yu4

Email Spam Detection Using Hierarchical Attention Hybrid Deep Learning Method [article]

Sultan Zavrak, Seyhmus Yilmaz
2022 arXiv   pre-print
The suggested technique's findings are compared to those of state-of-the-art models and show that our approach outperforms them.  ...  Nevertheless, the rise in email users has occurred a dramatic increase in spam emails in recent years.  ...  Experiments on blog spam detection, email spam detection, and splog detection were used to validate their findings. Zhan et al.  ... 
arXiv:2204.07390v2 fatcat:bqda43jov5ebbmgvcosbxrgatu

Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish

Akın Özçift, Kamil Akarsu, Fatma Yumuk, Cevhernur Söylemez
2021 Automatika  
representation from unlabelled text, has achieved remarkable results in many natural language processing (NLP) tasks with fine-tuning.  ...  Traditionally morphologically difficult languages require dense language pre-processing steps in order to model the data to be suitable for machine learning (ML) algorithms.  ...  BERT unravelled the unidirectional limitation with masked language model (MLM).  ... 
doi:10.1080/00051144.2021.1922150 doaj:9caee9f4f3664cce840352b66516ca08 fatcat:ov6svxkaxfhr3bpg35w7ncdtby

Twitter Bot Detection Using Bidirectional Long Short-term Memory Neural Networks and Word Embeddings [article]

Feng Wei, Uyen Trang Nguyen
2020 arXiv   pre-print
Twitter is a web application playing dual roles of online social networking and micro-blogging.  ...  To the best of our knowledge, our work is the first that develops a recurrent neural model with word embeddings to distinguish Twitter bots from human accounts, that requires no prior knowledge or assumption  ...  It can be learned using a variety of language models.  ... 
arXiv:2002.01336v1 fatcat:fmqknywbrvgc7j5obqnv3cldeq

The search and social media workshop at SIGIR 2009

Eugene Agichtein, Marti A. Hearst, Ian Soboro
2009 SIGIR Forum  
What are the needs of users, and models of those needs, specific to social media search? What models make the most sense? How does search interact with existing uses of social media?  ...  ., examined exploiting user reviews to improve search in wikipedia, followed by Seki who described automatically identifying spam blogs (also known as "splogs").  ... 
doi:10.1145/1670564.1670573 fatcat:kr6bl6u4ivanljt7ep6u4axv7y

Posting Bot Detection on Blockchain-based Social Media Platform using Machine Learning Techniques [article]

Taehyun Kim, Hyomin Shin, Hyung Ju Hwang, Seungwon Jeong
2020 arXiv   pre-print
We can extract the features of posts by clustering distances between blog data or replies.  ...  Compared with the bot detection on the usual social media platforms, the features we created have an advantage that posting bots can be detected without limiting the number or length of posts.  ...  In our labeling process, annotations were rarely proceeded for languages that the annotators were not familiar with.  ... 
arXiv:2008.12471v1 fatcat:togbjt4r4fhg3f3kng6zfuurbe

GetHealthyHarlem.org: developing a web platform for health promotion and wellness driven by and for the Harlem community

Sharib A Khan, Jessica S Ancker, Jianhua Li, David Kaufman, Carly Hutchinson, Alwyn Cohall, Rita Kukafka
2009 AMIA Annual Symposium Proceedings  
The site is gaining active use with more than 9,500 unique site visits in the six months since going live in November, 2008.  ...  In ongoing research studies, we are using the website to explore how the PAR model can be applied to the development of a community health website.  ...  Drupal is written in the popular programming language PHP and works with several relational databases such as MySQL.  ... 
pmid:20351872 pmcid:PMC2815482 fatcat:wzxtf672rrcr5o77geop2akj6y

Vlogging

Wen Gao, Yonghong Tian, Tiejun Huang, Qiang Yang
2010 ACM Computing Surveys  
By combining the grassroots blogging with the richness of expression available in video, videoblogs (vlogs for short) will be a powerful new media adjunct to our existing televised news sources.  ...  In recent years, blogging has become an exploding passion among Internet communities.  ...  This divergence in the language models can be exploited to effectively classify comments as spam or nonspam.  ... 
doi:10.1145/1749603.1749606 fatcat:qeopjzg4j5bwparwtbmvwdesd4
« Previous Showing results 1 — 15 out of 533 results