Filters








24 Hits in 3.2 sec

Splog Detection using Content, Time and Link Structures

Yu-Ru Lin, Hari Sundaram, Yun Chi, Jun Tatemura, Belle Tseng
2007 Multimedia and Expo, 2007 IEEE International Conference on  
Experiments based on the annotated ground truth on real world dataset show excellent results on splog detection tasks with 90% accuracy.  ...  The key idea is that splogs exhibit high temporal regularity in content and post time, as well as consistent linking patterns.  ...  We develop an approach that captures the repetitive splog structural properties. 3 REGULARITY-BASED DETECTION We have developed new techniques for splog detection based on temporal and linking patterns  ... 
doi:10.1109/icme.2007.4285079 dblp:conf/icmcs/LinSCTT07 fatcat:oul5q7j6ovdydiatkodx42kqmq

Detecting splogs via temporal dynamics using self-similarity analysis

Yu-Ru Lin, Hari Sundaram, Yun Chi, Junichi Tatemura, Belle L. Tseng
2008 ACM Transactions on the Web  
This article addresses the problem of spam blog (splog) detection using temporal and structural regularity of content, post time and links.  ...  We have developed a new technique for detecting splogs, based on the observation that a blog is a dynamic, growing sequence of entries (or posts) rather than a collection of individual pages.  ...  Though we have concentrated our discussion on the temporal properties of splogs and their effectiveness on splog detection, we recognize that link-based solutions can have significant impact.  ... 
doi:10.1145/1326561.1326565 fatcat:tyh5objdjvdinni3l5ansuxcwu

Splog detection using self-similarity analysis on blog temporal dynamics

Yu-Ru Lin, Hari Sundaram, Yun Chi, Junichi Tatemura, Belle L. Tseng
2007 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web - AIRWeb '07  
Second, we show via a novel visualization that the blog temporal characteristics reveal attribute correlation, depending on type of the blog (normal blogs and splogs).  ...  We first represent the blog temporal dynamics using self-similarity matrices defined on the histogram intersection similarity measure of the time, content, and link attributes of posts.  ...  There are several challenges that we propose to address as part of future research -(a) develop probabilistic representations of the topology and (b) short term topological analysis, including signature  ... 
doi:10.1145/1244408.1244410 fatcat:62flgqe3vjcnphcnvvuxitellq

Adversarial Information Retrieval on the Web (AIRWeb 2007)

Carlos Castillo, Kumar Chellapilla, Brian D. Davison
2008 SIGIR Forum  
Acknowledgments We extend our sincere thanks to WWW2007, to the authors and presenters, and to the members of the program committee for their contributions to the material that formed an outstanding workshop  ...  Session 1: Temporal and Topological Factors Belle Tseng started the morning session with her presentation on "Splog Detection Using Self-Similarity Analysis on Blog Temporal Dynamics" [9] .  ...  She presented their solution to splog detection which has three salient features: self-similarity analysis, visual characterization, and temporal feature computation.  ... 
doi:10.1145/1394251.1394267 fatcat:ywd3ygzofbbjtbwxshwgu3opy4

Modeling and Data Mining in Blogosphere

Nitin Agarwal, Huan Liu
2009 Synthesis Lectures on Data Mining and Knowledge Discovery  
We elaborate on approaches that extract communities and cluster blogs based on information of the bloggers.  ...  Generating synthetic data requires developing and fitting a model on the observed data.  ...  The members at the Social Computing Group, Data Mining and Machine Learning Lab at ASU made this project much easier and enjoyable.  ... 
doi:10.2200/s00213ed1v01y200907dmk001 fatcat:ifz4ic57sfcwbltrboans35zzm

Blogosphere

Nitin Agarwal, Huan Liu
2008 SIGKDD Explorations  
Yu , and Alan Zheng Zhao for collaboration, discussion, and valuable comments. • This work is, in part, sponsored by AFOSR and ONR grants in 2008.  ...  hyperlinks to classify a blog post as spam using a SVM based classifier • Lin et al. 2007 , consider the temporal dynamics of blog posts and propose a self similarity based splog detection algorithm  ...  based on characteristic patterns found in splogs like, -Regularities or patterns in posting times of splogs, -Content similarity in splogs, and -Similar links in splogs  ... 
doi:10.1145/1412734.1412737 fatcat:v4ec3j66aragrnczjlisl6yowe

Adversarial Web Search

Carlos Castillo
2010 Foundations and Trends in Information Retrieval  
[209] study temporal link-based features. These include the rate of growth and death of new in-links and out-links from the perspective of entire sites.  ...  These observations can be used to build features that capture regularity and self-similarity of temporal patterns, and these features can yield substantial improvements in the accuracy of a splog detection  ... 
doi:10.1561/1500000021 fatcat:toxnvajrmbdppf5hytdbnykuiq

Blog track research at TREC

Craig Macdonald, Rodrygo L.T. Santos, Iadh Ounis, Ian Soboroff
2010 SIGIR Forum  
This paper recaps on the tasks addressed at the TREC Blog track thus far, covering the period 2006 -2009.  ...  In particular, we describe the used corpora, the tasks addressed within the track, and the resulting published research.  ...  We are also thankful to Gilad Mishne and Maarten de Rijke for joining us in organising the TREC 2006 Blog track.  ... 
doi:10.1145/1842890.1842899 fatcat:aydy5eclwnfvdkm2zv5jygyt7y

Consent Through the Lens of Semantics:State of the Art Survey and Best Practices

Anelia Kurteva, Tek Raj Chhetri, Harshvardhan J. Pandit, Anna Fensel
2021 Zenodo  
We also focus on visualisation solutions aimed at improving individuals' consent comprehension. Finally, based on the overviewed state of the art we propose best practices for consent implementation.  ...  GDPR put focus on the concept of informed consent applicable for data processing, which led to an increase of the responsibilities regarding data sharing for both end users and companies.  ...  Based on the underlying semantics a machine is able to create the links between the consent decision and all information related to it.  ... 
doi:10.5281/zenodo.4732358 fatcat:5ivtazissnc4xpqomo3w7beg24

Augmenting User Models with Real World Experiences to Enhance Personalization and Adaptation [chapter]

Fabian Abel, Vania Dimitrova, Eelco Herder, Geert-Jan Houben
2012 Lecture Notes in Computer Science  
Digital traces can be attributed to more than one individual, e.g. a circle of friends, a scientific community or even a whole population can be characterized by topics they tweet about, or things they  ...  People share content about their activities, e.g. pictures taken at a concert, videos of business meetings, reports on business trips, personal stories.  ...  We thank the members of the Program Committee of AUM 2011 for their support and reviews.  ... 
doi:10.1007/978-3-642-28509-7_4 fatcat:ogz2gojyszffvdtq6fy4itvite

Text Mining in Big Data Analytics

Hossein Hassani, Christina Beneki, Stephan Unger, Maedeh Taj Mazinani, Mohammad Reza Yeganegi
2020 Big Data and Cognitive Computing  
on the predominant trends, methods, and applications of text mining research.  ...  In accordance with this, more than 200 academic journal articles on the subject are included and discussed in this review; the state-of-the-art text mining approaches and techniques used for analyzing  ...  most likely to blog about a specific topic and in identifying the associated links for a given blog post on a given topic and detect splog.  ... 
doi:10.3390/bdcc4010001 fatcat:6fvmne7f2fbovjp4na5hl2tmv4

Community Detection and Mining in Social Media

Lei Tang, Huan Liu
2010 Synthesis Lectures on Data Mining and Knowledge Discovery  
In particular, we discuss graph-based community detection techniques and many important extensions that handle dynamic, heterogeneous networks in social media.  ...  , from a data mining perspective, introduces characteristics of social media, reviews representative tasks of computing with social media, and illustrates associated challenges.  ...  Particular thanks go to Reza Zafarani and Gabriel Fung who read the earlier drafts of the manuscript and provided helpful comments to improve the readability.  ... 
doi:10.2200/s00298ed1v01y201009dmk003 fatcat:bxcd7hnfffdadgg6zx6mgiqloy

OntoROPA D1: State of the Art and Ambition

M.Mercedes Martínez-González, Pompeu Casanovas, María-Luisa Alvite-Díez, Núria Casellas
2021 Zenodo  
It combines building a professional ontology that will be part of this graph with the collection and management of the specific knowledge of the community of privacy and data protection experts—mainly  ...  OntoROPA proposes the creation of a knowledge graph, a RDF graph, to handle information about Records of Processing Activities (ROPAs).  ...  The latter are focused on legal knowledge, defining some more requirements based on the properties of normative legal systems (hierarchy, consistency, effectivity, etc.) to encompass the social and institutional  ... 
doi:10.5281/zenodo.4930186 fatcat:ul5ghc56pbdolcpunwez2tam3y

Machine Learning for E-mail Spam Filtering: Review,Techniques and Trends [article]

Alexy Bhowmick, Shyamanta M. Hazarika
2016 arXiv   pre-print
We focus primarily on Machine Learning-based spam filters and their variants, and report on a broad review ranging from surveying the relevant ideas, efforts, effectiveness, and the current progress.  ...  We present a comprehensive review of the most effective content-based e-mail spam filtering techniques.  ...  Analyzing Temporal Features As a novel solution to the spam problem, [Kiritchenko et al, 2004] employed temporal features of an e-mail to the conventional content-based approaches to create a richer  ... 
arXiv:1606.01042v1 fatcat:cblnuc4knfhehjwzjeeekbgf3m

Robust detection of comment spam using entropy rate

Alex Kantchelian, Justin Ma, Ling Huang, Sadia Afroz, Anthony Joseph, J. D. Tygar
2012 Proceedings of the 5th ACM workshop on Security and artificial intelligence - AISec '12  
IP, the same included links, etc.  ...  To train a logistic regression on this dataset using our features, we derive a simple mislabeling tolerant logistic regression algorithm based on expectationmaximization, which we show generally outperforms  ...  We are grateful to the Intel Science and Technology Center for Secure Computing, DARPA (grant N10AP20014), the National Science Foundation (through the TRUST Science and Technology Center), and the US  ... 
doi:10.1145/2381896.2381907 dblp:conf/ccs/KantchelianMHAJT12 fatcat:dzdvrvv2vvbudhvgko5qsnheha
« Previous Showing results 1 — 15 out of 24 results