3,123 Hits in 6.2 sec

A Comparative Analysis of Machine Learning Techniques for Spam Detection

Syed Ishfaq Manzoor, Lovely Professional University Punjab, India
2019 International Journal of Advanced Trends in Computer Science and Engineering  
Advertisement and bulk emails, also called as spam, makes an estimate of 62% of the Worldwide internet traffic.  ...  Since 1978, when first unwanted mail was sent, technology have advanced but still the detection of spams remains a chronophagous and big budget problem in the field of mathematical sciences.  ...  The authors of [12] proposed as metrics identified as Spam-Mass which was based on link structure for identification and classification of link spamming.  ... 
doi:10.30534/ijatcse/2019/73832019 fatcat:c7t3houioze35bgjuszaw7tpya

Link analysis for Web spam detection

Luca Becchetti, Carlos Castillo, Debora Donato, Ricardo Baeza-YATES, Stefano Leonardi
2008 ACM Transactions on the Web  
Based on these results we propose spam detection techniques which only consider the link structure of Web, regardless of page contents.  ...  We propose link-based techniques for automating the detection of Web spam, a term referring to pages which use deceptive techniques to obtain undeservedly high scores in search engines.  ...  a high link-based score) and content-based spam, having a few links, to avoid detection.  ... 
doi:10.1145/1326561.1326563 fatcat:6zbrk6u4fbdjpbg7hmmfl3uuua

Survey on web spam detection

Nikita Spirin, Jiawei Han
2012 SIGKDD Explorations  
We categorize all existing algorithms into three categories based on the type of information they use: content-based methods, link-based methods, and methods based on non-traditional data such as user  ...  In turn, we perform a subcategorization of link-based category into five groups based on ideas and principles used: labels propagation, link pruning and reweighting, labels refinement, graph regularization  ...  Link-based Spam Detection All link-based spam detection algorithms can be subdivided into five groups. Preliminaries • Web Graph Model.  ... 
doi:10.1145/2207243.2207252 fatcat:euakka22anfs7cbwatkguk224e

Detecting Link Hijacking by Web Spammers [chapter]

Young-joo Chung, Masashi Toyoda, Masaru Kitsuregawa
2009 Lecture Notes in Computer Science  
Web, so called, link spamming.  ...  We performed experiments on the large scale Japanese Web archive and evaluated the accuracy of our method. Detection precision of our approach was improved about 25% from a baseline approach.  ...  If this distribution is abnormal, SpamRank regards a target page as spam and penalizes it. Gyöngyi et al. suggested Mass Estimation in [10] .  ... 
doi:10.1007/978-3-642-01307-2_32 fatcat:32dmn4lxpjhm3dz5z5ri5ymtti

A Survey of Web Spam Detection Techniques

Mahdieh Danandeh Oskuie, Seyed Naser Razavi
2014 International Journal of Computer Applications Technology and Research  
Search engines try to place the best results in the first links of results on the basis of user's query.  ...  In this paper, we firstly present some definitions in terms of web spam. Then we explain different kinds of web spam, and we describe some method, used to combat with this difficulty.  ...  Spam Mass Estimation was introduced following TrustRank. Spam Mass is a measurement of how a page rank is created via linking by spam page.  ... 
doi:10.7753/ijcatr0303.1010 fatcat:hfqhm4eoq5fjlinjpptyxyhfs4

Adversarial Web Search

Carlos Castillo
2010 Foundations and Trends in Information Retrieval  
Estimates of expected revenues from clicks (based on clickthrough estimations) are in general susceptible to spamming activities. Immorlica et al.  ...  The seed set for the spam mass estimation should include not only the highest quality nodes, but many diverse non-spam nodes.  ... 
doi:10.1561/1500000021 fatcat:toxnvajrmbdppf5hytdbnykuiq

Know your neighbors

Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock, Fabrizio Silvestri
2007 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07  
In this paper we present a spam detection system that combines link-based and content-based features, and uses the topology of the Web graph by exploiting the link dependencies among the Web pages.  ...  The result is an accurate system for detecting Web spam, tested on a large and public dataset, using algorithms that can be applied in practice to large-scale Web data.  ...  Methods for the detection of link-based spam rely on automatic classifiers (e.g., [4] ), propagating trust or distrust through links (e.g., [13] ), detecting anomalous behavior of link-based ranking  ... 
doi:10.1145/1277741.1277814 dblp:conf/sigir/CastilloDGMS07 fatcat:gmanukmghrcxxd64njwdlp4gxy

Spam detection with a content-based random-walk algorithm

F. Javier Ortega, Craig Macdonald, José A. Troyano, Fermín Cruz
2010 Proceedings of the 2nd international workshop on Search and mining user-generated contents - SMUC '10  
Our experiments show that our proposed technique outperforms other link-based techniques for spam detection.  ...  In this work we tackle the problem of the spam detection on the Web.  ...  They obtain an estimator for this metric by calculating the estimated non-spam mass, that is the amount of PageRank received from a set of (hand-picked) trusted pages.  ... 
doi:10.1145/1871985.1871994 dblp:conf/cikm/OrtegaMTC10 fatcat:h3io3pv4bfawrhk3jhvelhumqi

Splog Detection using Content, Time and Link Structures

Yu-Ru Lin, Hari Sundaram, Yun Chi, Jun Tatemura, Belle Tseng
2007 Multimedia and Expo, 2007 IEEE International Conference on  
Experiments based on the annotated ground truth on real world dataset show excellent results on splog detection tasks with 90% accuracy.  ...  This paper focuses on spam blog (splog) detection. Blogs are highly popular, new media social communication mechanisms and splogs corrupt blog search results as well as waste network resources.  ...  It classifies a webpage as spam by estimating the spam mass-the amount of PageRank score contributed by other spam pages.  ... 
doi:10.1109/icme.2007.4285079 dblp:conf/icmcs/LinSCTT07 fatcat:oul5q7j6ovdydiatkodx42kqmq

A Spamicity Approach to Web Spam Detection [chapter]

Bin Zhou, Jian Pei, Zhaohui Tang
2008 Proceedings of the 2008 SIAM International Conference on Data Mining  
We propose efficient online link spam and term spam detection methods using spamicity. Our methods do not need training and are cost effective.  ...  on the web.  ...  They discussed how to estimate spam mass and how the estimations can help to identify pages that benefit significantly from link spam.  ... 
doi:10.1137/1.9781611972788.25 dblp:conf/sdm/ZhouPT08 fatcat:5pqhiwzvljgb7byjrwoemdpd7e

Methods for Web-Spam Detection on web: Principles and Algorithms

Parminder Kaur
2018 International Journal of Scientific Research in Computer Sciences and Engineering  
The present research focuses on systematically analyzing and categorizing models that detect review spam. However, spamming is considered as critical issue in web mining.  ...  Different detection techniques have different strengths and weaknesses and thus favor different detection contexts.  ...  Compare and contrast various algorithms of link-based spam detection on web.  ... 
doi:10.26438/ijsrcse/v6i2.119125 fatcat:ykhhrfbs3nffjf2vm7syvrjtyi

Identifying spam link generators for monitoring emerging web spam

Young-joo Chung, Masashi Toyoda, Masaru Kitsuregawa
2010 Proceedings of the 4th workshop on Information credibility - WICOW '10  
Detecting such spam link generators is important because almost all new spam links are created by them.  ...  In order to classify spam link generators, we investigate various linkbased features including modified PageRank scores based on white and spam seeds, and these scores of neighboring hosts.  ...  If this distribution is abnormal, SpamRank regards a target page as spam and penalizes it. Gyöngyi et al. suggested Mass Estimation in [18] .  ... 
doi:10.1145/1772938.1772950 dblp:conf/www/ChungTK10 fatcat:qe2ppcq7gnekpl2it7daogrkaa

A link and Content Hybrid Approach for Arabic Web Spam Detection

Heider A. Wahsheh, Mohammed N. Al-Kabi, Izzat M. Alsmadi
2012 International Journal of Intelligent Systems and Applications  
It produces accuracy of 90.1099% for Arabic contentbased, 93.1034% for Arabic link-based, and 89.011% in detecting both Arabic content and link Web spam, based on the collected dataset and conducted analysis  ...  The automated classification of spam Web pages used based on the features in the benchmark dataset.  ...  We plan to extend this work in the future to study and investigate the detection of the malicious links in Arabic spammed Web pages.  ... 
doi:10.5815/ijisa.2013.01.03 fatcat:b3stgkvouzec7nxf75kppwjnda

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach [chapter]

Alex Hai Wang
2010 Lecture Notes in Computer Science  
A machine learning approach is proposed to distinguish the spam bots from normal ones.  ...  To facilitate the spam bots detection, three graph-based features, such as the number of friends and the number of followers, are extracted to explore the unique follower and friend relationships among  ...  In [5] , the authors based on the link structure of the Web proposed a measurement Spam Mass to identify link spamming. A directed graph model of the Web is proposed in [6] .  ... 
doi:10.1007/978-3-642-13739-6_25 fatcat:laq6jggggjdt3ba4uz5szktnou

Link spam target detection using page farms

Bin Zhou, Jian Pei
2009 ACM Transactions on Knowledge Discovery from Data  
Gregory Piatetsky-Shapiro, the associate editor, for their insightful, constructive, and detailed comments on the previous versions of this article.  ...  They discussed how to estimate spam mass and how the estimations can help to identify pages that benefit significantly from link spam.  ...  Spam mass estimates are easy to compute using two sets of PageRank scores-a regular one and the other one with the random jump biased to some known good nodes.  ... 
doi:10.1145/1552303.1552306 fatcat:67ujkjzlmzdufd2eigv52ajiky
« Previous Showing results 1 — 15 out of 3,123 results