1,940 Hits in 5.3 sec

Efficient and Effective Spam Filtering and Re-ranking for Large Web Datasets [article]

Gordon V. Cormack, Mark D. Smucker, Charles L. A. Clarke
2010 arXiv   pre-print
We examine the effect of spam on the results of the TREC 2009 web ad hoc and relevance feedback tasks, which used the ClueWeb09 dataset.  ...  We show that a simple content-based classifier with minimal training is efficient enough to rank the "spamminess" of every page in the dataset using a standard personal computer in 48 hours, and effective  ...  Acknowledgements The authors thank Ellen Voorhees and Ian Soboroff at the National Institute of Standards and Technology (U.S.A) for providing access to the TREC data.  ... 
arXiv:1004.5168v1 fatcat:a2byvm2rkjbbhisi7vlmsjvyym

Effects of spam removal on search engine efficiency and effectiveness

Matt Crane, Andrew Trotman
2012 Proceedings of the Seventeenth Australasian Document Computing Symposium on - ADCS '12  
We also investigate the resulting search effectiveness and efficiency when different amounts of spam are withheld.  ...  In this paper we investigate the effect that withholding documents identified as spam has on the resources required to process large collections.  ...  They generated four different rankings of the spamminess of pages within the English ClueWeb09 dataset: • UK2006: A set of labels trained against a small set of web pages containing 746 spam pages and  ... 
doi:10.1145/2407085.2407086 dblp:conf/adcs/CraneT12 fatcat:rpopemgedfd4xiv6tktp2j335e

THUIR at TREC 2009 Web Track: Finding Relevant and Diverse Results for Large Scale Web Search

Zhichao Li, Fei Chen, Qianli Xing, Junwei Miao, Yufei Xue, Tong Zhu, Bo Zhou, Rongwei Cen, Yiqun Liu, Min Zhang, Yijiang Jin, Shaoping Ma
2009 Text Retrieval Conference  
On ad hoc task, we improved the efficiency of our distributed retrieval system TMiner to handle terabytes of Web data.  ...  Then three studies have been done, namely page quality estimation, ranking feature analysis, and model comparison.  ...  Acknowledgement We would like to thank Qian Wang, Huijia Yu, Xudong Li, Yu Sun and Wei Yang for their help on system building and data preprocessing.  ... 
dblp:conf/trec/LiCXMXZZCLZJM09 fatcat:xvat7ftcsvesrpi4rjeaiqgnkm

The Impact of Feature Selection on Web Spam Detection

Jaber Karimpour, Ali A. Noroozi, Adeleh Abadi
2012 International Journal of Intelligent Systems and Applications  
Signature-driven spam detection provides an alternative to machine learning approaches and can be very effective when near-duplicates of essentially the same message are sent in high volume [20] .  ...  The proposed method is shown to consistently outperform traditional I-Match in the spam filtering application.  ...  We also showed that in near-duplicate applications involving document classification, such as spam filtering, term ranking induced by standard feature selection can be used an alternative to traditional  ... 
doi:10.5815/ijisa.2012.09.08 fatcat:fooj3rz3qbf25k7skwzxx4sohm

Detecting Web Spam Based on Novel Features from Web Page Source Code

Jiayong Liu, Yu Su, Shun Lv, Cheng Huang, Liguo Zhang
2020 Security and Communication Networks  
Experiment results show that the proposed model could effectively detect web spam.  ...  Fierce competition for the ranking in search engines is not conducive to both users and search engines. Existing research mainly studies the content and links of websites.  ...  As evident, spammers try to deceive search engines and attract end users to click on web spam sites. ey not only reduce the effectiveness and efficiency of search engine results since web spam pages take  ... 
doi:10.1155/2020/6662166 fatcat:opknwyq3jfe2baaa33xg4vhdli

Term associations in query expansion

Michael Symonds, Guido Zuccon, Bevan Koopman, Peter Bruza, Laurianne Sitbon
2013 Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM '13  
The results demonstrate that this approach can provide significant improvements in web retrieval effectiveness when compared to a strong benchmark retrieval system.  ...  However, structural linguistics proposes that the meaning of a word is also dependent on its paradigmatic associations, which are formed between words that can substitute for each other without effecting  ...  This filtered list is then padded, to create a ranked list of 10,000 documents, using the ranked documents returned by a unigram language model on the spam filtered index.  ... 
doi:10.1145/2505515.2507852 dblp:conf/cikm/SymondsZKBS13 fatcat:nbnxylefzrgk5mlbhpgtqakfgy

LENS: Leveraging social networking and trust to prevent spam transmission

Sufian Hameed, Xiaoming Fu, Pan Hui, Nishanth Sastry
2011 2011 19th IEEE International Conference on Network Protocols  
LENS proved to be fast in processing emails (around 2-3 orders of magnitude better than SpamAssassin) and scales efficiently with increasing community size and GKs.  ...  We also evaluate the computational complexity of email processing with LENS deployed on two Mail Servers (MSs) and compared it with the most popular content-based filter i.e SpamAssassin.  ...  By using this protocol, RE can accept almost 85% of received emails and prevent up to 88% false positive by the existing spam filters.  ... 
doi:10.1109/icnp.2011.6089044 dblp:conf/icnp/HameedFHS11 fatcat:pdpzbc4fffhlbpcyf7ibykkln4

Follow Spam Detection based on Cascaded Social Information [article]

Sihyun Jeong, Giseop Noh, Hayoung Oh, Chong-kwon Kim
2016 arXiv   pre-print
Particularly, we focused on cascaded social relations and devised two schemes, TSP-Filtering and SS-Filtering, each of which utilizes Triad Significance Profile (TSP) and Social status (SS) in a two-hop  ...  Spammers abuse SNSs as vehicles to spread spams rapidly and widely. Spams, unsolicited or inappropriate messages, significantly impair the credibility and reliability of services.  ...  Link spam Filtering Link spam has been widely studied in the web spam detection field. This type of spam is presented as numerous links from a large number of web pages to a few target web pages.  ... 
arXiv:1605.00448v1 fatcat:w6pwgwbz3feg5nin24ohaeegya

Spam filter evaluation with imprecise ground truth

Gordon V. Cormack, Aleksander Kolcz
2009 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09  
When trained and evaluated on accurately labeled datasets, online email spam filters are remarkably effective, achieving error rates an order of magnitude better than classifiers in similar applications  ...  Erroneous labels are problematic, however, when used as ground truth to measure filter effectiveness.  ...  INTRODUCTION When trained and evaluated on accurately labeled datasets, online email spam filters achieve remarkably good performance.  ... 
doi:10.1145/1571941.1572045 dblp:conf/sigir/CormackK09 fatcat:s3eanhxrrvehbcy5we6aamvkfa

Using supervised machine learning algorithms to detect suspicious URLs in online social networks

Mohammed Al-Janabi, Ed de Quincey, Peter Andras
2017 Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017 - ASONAM '17  
For the data collection stage, the Twitter streaming application programming interface (API) was used and VirusTotal was used for labelling the dataset.  ...  These URLs could direct users to websites that contain malicious content, drive-by download attacks, phishing, spam, and scams.  ...  The source code of our spam classification tool and the data set that we used are available from the authors on request.  ... 
doi:10.1145/3110025.3116201 dblp:conf/asunam/Al-JanabiQA17 fatcat:iud3jaub75f77lzy6mtbfnx4za

Artificial immune system inspired behavior-based anti-spam filter

Xun Yue, Ajith Abraham, Zhong-Xian Chi, Yan-You Hao, Hongwei Mo
2006 Soft Computing - A Fusion of Foundations, Methodologies and Applications  
Experiment results using real-world datasets reveal that the proposed technology is reliable, efficient and scalable.  ...  This paper proposes a novel behavior-based anti-spam technology for email service based on an artificial immune-inspired clustering algorithm.  ...  Authors would like to thanks the three anonymous referees for the constructive comments that helped to enhance the quality and presentation of this paper.  ... 
doi:10.1007/s00500-006-0116-0 fatcat:lxgft4o7cbelfkcelbjvud7sym

Coniunge et Impera: Multiple-Graph Mining for Query-Log Analysis [chapter]

Ilaria Bordino, Debora Donato, Ricardo Baeza-Yates
2010 Lecture Notes in Computer Science  
We show that our approach achieves very good performance for two different applications, which are classifying query transitions and recognizing spam queries.  ...  Hence, they contain a wealth of valuable knowledge about the users' interests and preferences, as well as the implicit feedback that Web searchers provide when they click on the results obtained for their  ...  For efficiency and scalability, we used Hadoop's MapReduce 3 to extract the query graphs. MapReduce [13] is a popular programming model for processing large-scale data.  ... 
doi:10.1007/978-3-642-15880-3_17 fatcat:j5cz3jaojfbyrnr2fewh3tm6uq

Development of Answer Validation System Using Responders' Attributes and Crowd Ranking

Mercy Adebisi, Bolanle Ojokoh, Tolulope Adebayo, Akintoba Akinwonmi, Fatai Sunmola
2021 Journal of Service Science and Management  
Therefore, this work proposed a system that seeks to validate answers to questions provided by respondents using responders' attributes and crowd ranking technique.  ...  Thereafter, valid answers were ranked by the crowd using Borda Count algorithm. The proposed system was evaluated using Usability and User experience (UX) measurement.  ...  For the effectiveness of the system, illegitimate questions and answers were filtered out using a trained Naïve Bayes spam filter with a threshold of 0.5.  ... 
doi:10.4236/jssm.2021.143024 fatcat:enh6lojvkfhxjpwdmptav7jofi

Spam or ham?

Anirudh Ramachandran, Anirban Dasgupta, Nick Feamster, Kilian Weinberger
2011 Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference on - CEAS '11  
This attack confuses spam filters, since it causes spam messages to be mislabeled as legitimate; thus, spammer IP addresses can continue sending spam for longer.  ...  Web mail providers rely on users to "vote" to quickly and collaboratively identify spam messages.  ...  These votes from users, sometimes referred to as "community clicks" or "community filtering", are in most cases the best defense against spam for large Web mail providers [7] .  ... 
doi:10.1145/2030376.2030401 dblp:conf/ceas/RamachandranDFW11 fatcat:3j66xkecdzhrlefbtdvn7jnnda

Adversarial Web Search

Carlos Castillo
2010 Foundations and Trends in Information Retrieval  
[231] argue that detecting those aggregators and funnels that do the redirection for large sets of pages is an effective way of eliminating massive amounts of Web spam with less effort than blacklisting  ...  one "customer", for efficiency reasons.  ... 
doi:10.1561/1500000021 fatcat:toxnvajrmbdppf5hytdbnykuiq
« Previous Showing results 1 — 15 out of 1,940 results