Filters








69,787 Hits in 13.0 sec

Learning Web Page Block Functions using Roles of Images

Xin Yang, Yuanchun Shi
2008 2008 Third International Conference on Pervasive Computing and Applications  
We experiment on 140 Web pages and demonstrate that utilizing roles of images can significantly improve the classification quality of learning Web page block functions.  ...  Making use ofblock information in Web IR and Data Mining tasks calls for a good understanding of the function of each block.  ...  [16] noticed these limitations and conducted a comparative study to evaluate the influence of several factors on learning block functions, such as features and classifiers.  ... 
doi:10.1109/icpca.2008.4783565 fatcat:k3fvi3mlsjgztcu4jrbz5aho3q

Feature Selection For Web Page Classification Using Swarm Optimization

B. Leela Devi, A. Sankar
2015 Zenodo  
The aim of this study is to reduce the number of features to be used to improve the accuracy of the classification of web pages.  ...  The extracted features were tested on the WebKB dataset using a parallel Neural Network to reduce the computational cost.  ...  No uniform approach was presented, in the same works, to measure important web page portions. A user study by [9] found people having consistent views on web page blocks importance.  ... 
doi:10.5281/zenodo.1099636 fatcat:bkkhpgxj55blpd2rpbga7uh3ny

To Block or Not to Block: Accelerating Mobile Web Pages On-The-Fly Through JavaScript Classification [article]

Moumena Chaqfeh, Muhammad Haseeb, Waleed Hashmi, Patrick Inshuti, Manesha Ramesh, Matteo Varvello, Fareed Zaffar, Lakshmi Subramanian, Yasir Zaki
2021 arXiv   pre-print
SlimWeb improves the overall user experience by more than 60% compared to the original pages, while maintaining 90%-100% of the visual and functional components of most pages.  ...  In this paper, we propose SlimWeb, a novel approach that automatically derives lightweight versions of mobile web pages on-the-fly by eliminating the use of unnecessary JavaScript.  ...  In order to evaluate the effect of SlimWeb on the quality of the pages, we conducted a user study with 62 participants.  ... 
arXiv:2106.13764v1 fatcat:j4v6bapk3feojp2tddtnsjp47y

Evaluating the visual quality of web pages using a computational aesthetic approach

Ou Wu, Yunfei Chen, Bing Li, Weiming Hu
2011 Proceedings of the fourth ACM international conference on Web search and data mining - WSDM '11  
The performance of the learned visual quality classifier is close to some persons'. The learned regression function also achieves promising results.  ...  First, a Web page layout extraction algorithm (V-LBE) is introduced to partition a Web page into major layout blocks.  ...  The learning of VisQ classifier and VisQ regression function in this study is based on training pages and their score/label sets. In this paper, a Web page is represented by a feature vector X k .  ... 
doi:10.1145/1935826.1935883 dblp:conf/wsdm/WuCLH11 fatcat:ppk2lrjl5fdplkbgooqwdyy3di

Information Classification and Extraction on Official Web Pages of Organizations

Jinlin Wang, Xing Wang, Hongli Zhang, Binxing Fang, Yuchen Yang, Jia'nan Liu
2020 Computers Materials & Continua  
As a real-time and authoritative source, the official Web pages of organizations contain a large amount of information.  ...  After locating the active blocks in the Web pages, the structural and content features are proposed to classify information with the specific model.  ...  By constructing some URLs, it classifies categories of Web pages and matches the classified Web pages with patterns.  ... 
doi:10.32604/cmc.2020.011158 fatcat:6qdein5pw5dtxlx7nshihfqgdy

A text mining approach to Internet abuse detection

Chen-Huei Chou, Atish P. Sinha, Huimin Zhao
2008 Information Systems and E-Business Management  
We have empirically compared a variety of term weighting, feature selection, and classification techniques for Internet abuse detection in the workplace of software programmers.  ...  As the use of the Internet in organizations continues to grow, so does Internet abuse in the workplace.  ...  Text categorization techniques can be used to automatically learn classifiers from a set of pre-classified Web pages (the training dataset).  ... 
doi:10.1007/s10257-007-0070-0 fatcat:4ynqj54bxvgezovleqkek56akq

Automatic identification of informative sections of Web pages

S. Debnath, P. Mitra, N. Pal, C.L. Giles
2005 IEEE Transactions on Knowledge and Data Engineering  
First, a tool must segment the Web-pages into Web-page blocks and second, the tool must separate the primary content blocks from the non-informative content blocks.  ...  Web-pages -especially dynamically generated ones -contain several items that cannot be classified as the "primary content", e.g., navigation sidebars, advertisements, copyright notices, etc.  ...  First, the algorithms partition the Web-page into blocks based on heuristics. These heuristics are based on our previous study of HTML editing style over a few thousand Web-pages.  ... 
doi:10.1109/tkde.2005.138 fatcat:2iataz2htvfjbkurg2sqpdkmxi

Learning from multi-topic web documents for contextual advertisement

Yi Zhang, Arun C. Surendran, John C. Platt, Mukund Narasimhan
2008 Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08  
Often advertisers wish to either target (or avoid) some specific content on web pages which may appear only in a small part of the page.  ...  Contextual advertising on web pages has become very popular recently and it poses its own set of unique text mining challenges.  ...  Figure 1 illustrates the above problem -a web page is divided into a number of content blocks based on its HTML structure.  ... 
doi:10.1145/1401890.1402015 dblp:conf/kdd/ZhangSPN08 fatcat:md5ucbkshzba3gldicadsaj7r4

Combining classifiers for harmful document filtering

Bruno Grilhères, Stephan Brunessaux, Philippe Leray
2004 Open research Areas in Information Retrieval  
In this paper, we describe the experiments that we have carried out during the European Research Project NetProtect II that aims at filtering harmful Web pages in order to protect children.  ...  These experiments focus on the combination of classifiers (relying on texts, images and addresses), dealing with heterogeneous classes (bomb-making, drug, pornography, violence) for multimedia documents  ...  The training of each SVM is realized on Web pages relative to a category (bomb, drug, pornography, violence) on the two considered languages.  ... 
dblp:conf/riao/GrilheresBL04 fatcat:ymwstvo6f5b5zavyevfkhzezte

Learning important models for web page blocks based on layout and content analysis

Ruihua Song, Haifeng Liu, Ji-Rong Wen, Wei-Ying Ma
2004 SIGKDD Explorations  
Through a user study, we found that people do have a consistent view about the importance of blocks in a web page.  ...  Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent.  ...  For example, a big block in a small page will always be taken as small block when comparing it with the blocks in a big page.  ... 
doi:10.1145/1046456.1046459 fatcat:oaranxeiyvbslpfquye5kihw54

Web Crawler: Design And Implementation For Extracting Article-Like Contents

Ngo Le Huy Hien, Thai Quang Tien, Nguyen Van Hieu
2020 Cybernetics and Physics  
Web Crawler, therefore, is a critical part of search engines to navigate and download full texts of the web pages.  ...  As search engine systems play a significant role in cybernetics, telecommunication, and physics, many efforts were made to enhance their capacity.However, most of the data contained on the web are unmanaged  ...  The World Wide Web has a graphical structure in which links displayed on a web page could be used to open other web pages.  ... 
doi:10.35470/2226-4116-2020-9-3-144-151 fatcat:ds7oilpsjvcerjcf4oty5lhi3e

MyTrackingChoices: Pacifying the Ad-Block War by Enforcing User Privacy Preferences [article]

Jagdish Prasad Achara, Javier Parra-Arnau, Claude Castelluccia
2016 arXiv   pre-print
Therefore, our proposed approach consists in providing users with an option to specify the categories of web pages that are privacy-sensitive to them and block trackers present on such web pages only.  ...  To test the viability of our solution, we implemented it as a Google Chrome extension, named MyTrackingChoices (available on Chrome Web Store).  ...  Parra-Arnau is the recipient of a Juan de la Cierva postdoctoral fellowship, FJCI-2014-19703, from the Spanish Ministry of Economy and Competitiveness.  ... 
arXiv:1604.04495v1 fatcat:j2r6yunsmvattapu6qj64bwulu

Learning block importance models for web pages

Ruihua Song, Haifeng Liu, Ji-Rong Wen, Wei-Ying Ma
2004 Proceedings of the 13th conference on World Wide Web - WWW '04  
Through a user study, we found that people do have a consistent view about the importance of blocks in web pages.  ...  Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent.  ...  ACKNOWLEDGEMENTS We are grateful to Professor H-Arno Jacobsen at University of Toronto for his valuable comments and revision on our paper.  ... 
doi:10.1145/988672.988700 dblp:conf/www/SongLWM04 fatcat:qdunbz4ewfcgpee2xcvmbs6yq4

Commercial Internet filters: Perils and opportunities

Chen-Huei Chou, Atish P. Sinha, Huimin Zhao
2010 Decision Support Systems  
These products mainly rely on black lists, white lists, and keyword/profile matching to filter out undesired web pages.  ...  The evaluation results point to the perils of using commercial Internet filters on one hand, and to the prospects of using text mining on the other.  ...  The web page classifiers convincingly outperformed the filters.  ... 
doi:10.1016/j.dss.2009.11.002 fatcat:jacfmkqvrnbcpdbl7466ib4vvm

Detecting Anti Ad-blockers in the Wild

Muhammad Haris Mughees, Zhiyun Qian, Zubair Shafiq
2017 Proceedings on Privacy Enhancing Technologies  
The clash between ad-blockers and anti ad-blockers has resulted in a new arms race on the Web.  ...  The approach is promising with precision of 94.8% and recall of 93.1%. Our automated approach allows us to conduct a large-scale measurement study of anti ad-blockers on Alexa top-100K websites.  ...  This work is supported in part by a grant from the Data Transparency Lab.  ... 
doi:10.1515/popets-2017-0032 dblp:journals/popets/MugheesQS17 fatcat:fpxsehrt7barjhkdvn4tp4xt5i
« Previous Showing results 1 — 15 out of 69,787 results