Filters








284 Hits in 4.5 sec

Zero-cost labelling with web feeds for weblog data extraction

George Gkotsis, Karen Stepanyan, Alexandra I. Cristea, Mike S. Joy
2013 Proceedings of the 22nd International Conference on World Wide Web - WWW '13 Companion  
In this paper, we propose a fully automated approach in generating a wrapper for weblogs, which exploits web feeds for cheap labelling of weblog properties.  ...  Data extraction from web pages often involves either human intervention for training a wrapper or a reduced level of granularity in the information acquired.  ...  CONCLUSIONS We have presented a method for fully automated weblog wrapper generation. Based on the weblogs' feeds, our model realises an effective and zero-cost labelling technique.  ... 
doi:10.1145/2487788.2487819 dblp:conf/www/GkotsisSCJ13 fatcat:2yienhy72reshderi6ccs4fk2e

A Scalable Approach to Harvest Modern Weblogs

Vangelis Banos, Olivier Blanvillain, Nikos Kasioumis, Yannis Manolopoulos
2015 International journal on artificial intelligence tools  
To achieve this goal, we introduce a simple yet robust and scalable algorithm to generate extraction rules based on string matching using the blog's web feed in conjunction with blog hypertext.  ...  Blogs are one of the most prominent means of communication on the web.  ...  Gkotsis from the University of Warwick for generously sharing his research material, time, and ideas with us.  ... 
doi:10.1142/s0218213015400059 fatcat:mggtuhzpzzfz5gsd3gydj4dpoi

Deriving marketing intelligence from online discussion

Natalie Glance, Matthew Hurst, Kamal Nigam, Matthew Siegler, Robert Stockton, Takashi Tomokiyo
2005 Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05  
Given the volume, format and content of the data, the appropriate approach to understand this data is to use large-scale web and text data mining technologies.  ...  Weblogs and message boards provide online forums for discussion that record the voice of the public.  ...  For such weblogs, we automatically use the feed to extract new posts with near 100% accuracy instead of crawling and segmenting the weblog.  ... 
doi:10.1145/1081870.1081919 dblp:conf/kdd/GlanceHNSST05 fatcat:3l3obqej4remtoxmbicba5txfa

Weblog Clustering in Multilinear Algebra Perspective [article]

Andri Mirzal
2009 arXiv   pre-print
Then, 3-way adjacency tensor is extracted from the network and the PARAFAC decomposition is applied to the tensor to get pairs of node lists and label lists with scores attached to each list as the indication  ...  The proposed method first creates labeled-link network representation of the weblog datasets, where the nodes are the blogs and the labels are the shared words.  ...  Our codes for download and data preparation steps are written in python by using Universal Feed Parser module 6 for parse() function (see Algorithm 2), and for decomposition step are written in MATLAB  ... 
arXiv:0909.2345v1 fatcat:72vjeerpxfdbtfx4agkhixufau

Survey Paper On Techniques Used In Opinion Mining

2017 International Journal of Recent Trends in Engineering and Research  
In this, the various methods of mining in multiple ways such as web and data mining are used to retrieve data from web sites and to optimize we need to go through the queries of data mining.  ...  Here, the data mining concepts are used which mainly deals with mining the UGC from different e-commerce websites which are being used in our daily routine life and after we extract the required UGC we  ...  Nidhi Sharma for her guidance and support during project development. Also, we are thankful to BVCOE principal and all staff for their good support.  ... 
doi:10.23883/ijrter.2017.2997.xegno fatcat:75xnihch6rcuda4ddjfxsalame

Improving Specimen Labelling and Data Collection in Bio-science Research using Mobile and Web Applications

Isaac Nyabisa Oteyo, Mary Esther Muyoka Toili
2020 Open Computer Science  
We present WebLog, an application that we prototyped to aid researchers generate specimen labels and collect data from experiment sites.  ...  Once a specimen label is successfully scanned, the application automatically invokes the data entry form. The collected data is immediately sent to the server in electronic form for analysis.  ...  Acknowledgement: We thank the anonymous referees for their insightful comments. We also thank the Kenya Education Network (KENET) for providing the hosting services for the prototype application.  ... 
doi:10.1515/comp-2020-0002 fatcat:zfhizqpj4vei3eh4lj3j7ftbjq

The Pulse of News in Social Media: Forecasting Popularity [article]

Roja Bandari, Sitaram Asur, Bernardo A. Huberman
2012 arXiv   pre-print
Prior research has dealt with predicting eventual online popularity based on early popularity.  ...  Hence, the task of predicting the popularity of news items on the social web is both interesting and challenging.  ...  Category Score News feeds provided by Feedzilla are pre-tagged with category labels describing the content.  ... 
arXiv:1202.0332v1 fatcat:hljai7wkffck7jvmymsjjdtiqi

The Information Ecology of Social Media and Online Communities

Tim Finin, Anupam Joshi, Pranam Kolari, Akshay Java, Anubhav Kale, Amit Karandikar
2008 The AI Magazine  
One thing that sets these "Web 2.0" sites apart from traditional Web pages and resources is that they are intertwined with other forms of networked data.  ...  Social media systems such as weblogs, photo- and link-sharing sites, Wikis and on-line forums are currently thought to produce up to one third of new Web content.  ...  James Mayfield, Justin Martineau and Sandeep Balijepalli for their contributions to the work on sentiment detection.  ... 
doi:10.1609/aimag.v29i3.2158 fatcat:qmuzv4t7gba5vmaic3nzzfldna

Facilitating SQL Query Composition and Analysis [article]

Zainab Zolaktaf, Mostafa Milani, Rachel Pottinger
2020 arXiv   pre-print
Formulating efficient SQL queries requires several cycles of tuning and execution, particularly for inexperienced users.  ...  This is particularly important in settings with limited access to the database instance.  ...  ., cloud-based data warehouses like Google BigQuery [16] , databases on the hidden web, sources located behind wrappers in data integration systems [6] , and instances with limited access due to cost  ... 
arXiv:2002.09091v1 fatcat:bvvvrugpbvh2tmmohz32anf47m

Improving Schema Matching with Linked Data [article]

Ahmad Assaf, Eldad Louw, Aline Senart, Corentin Follenfant, Raphaël Troncy, David Trastour
2012 arXiv   pre-print
With today's public data sets containing billions of data items, more and more companies are looking to integrate external data with their traditional enterprise data to improve business intelligence analysis  ...  First experiments show that using Linked Data to map cell values with instances and column headers with types improves significantly the quality of the matching results and therefore should lead to more  ...  These approaches are however restricting label candidates to Web content from which the data was extracted.  ... 
arXiv:1205.2691v2 fatcat:lqjtwtvngnd3rebaeixfl647ai

Text and Structural Data Mining of Influenza Mentions in Web and Social Media

Courtney Corley, Diane Cook, Armin Mikler, Karan Singh
2010 International Journal of Environmental Research and Public Health  
Text and structural data mining of web and social media (WSM) provides a novel disease surveillance resource and can identify online communities for targeted public health communications (PHC) to assure  ...  Link analysis reveals communities for targeted PHC. Text mining is shown to identify trends in flu posts that correlate to real-world influenza-like illness patient report data.  ...  PNNL is operated by Battelle Memorial Institute for the U.S. Department of Energy under contract DE-AC05-76RL01830.  ... 
doi:10.3390/ijerph7020596 pmid:20616993 pmcid:PMC2872292 fatcat:qsqq3g2htfdifizd4b7evmjdye

RUBIX

Ahmad Assaf, Eldad Louw, Aline Senart, Corentin Follenfant, Raphaël Troncy, David Trastour
2012 Proceedings of the First International Workshop on Open Data - WOD '12  
With today's public data sets containing billions of data items, more and more companies are looking to integrate external data with their traditional enterprise data to improve business intelligence analysis  ...  First experiments show that using Linked Data to map cell values with instances and column headers with types improves significantly the quality of the matching results and therefore should lead to more  ...  These approaches are however restricting label candidates to Web content from which the data was extracted.  ... 
doi:10.1145/2422604.2422607 dblp:conf/wod/AssafLSFTT12 fatcat:r4ezamwdsvhpfeqg5qreaolzam

Information diffusion through blogspace

Daniel Gruhl, R. Guha, David Liben-Nowell, Andrew Tomkins
2004 Proceedings of the 13th conference on World Wide Web - WWW '04  
We study the dynamics of information propagation in environments of low-overhead personal publishing, using a large collection of weblogs over time as our example domain.  ...  The set of nodes are connected in a directed graph with each edge (u, v) labeled with a probability pu,v.  ...  All features extracted using any of these methods are then spotted wherever they occur in the corpus, and extracted with metadata indicating the date and blog of occurrence.  ... 
doi:10.1145/988672.988739 dblp:conf/www/GruhlGLT04 fatcat:pichvz4ntzggxm5wopjtiqgz2m

Information diffusion through blogspace

D. Gruhl, David Liben-Nowell, R. Guha, A. Tomkins
2004 SIGKDD Explorations  
We study the dynamics of information propagation in environments of low-overhead personal publishing, using a large collection of weblogs over time as our example domain.  ...  The set of nodes are connected in a directed graph with each edge (u, v) labeled with a probability p u,v .  ...  All features extracted using any of these methods are then spotted wherever they occur in the corpus, and extracted with metadata indicating the date and blog of occurrence.  ... 
doi:10.1145/1046456.1046462 fatcat:yasoh2l2yngfvip5yj746bmjri

Computing Sentiment Polarity of Texts at Document and Aspect Levels

Vivek Kumar Singh, Rajesh Piryani, Pranav Waila, Madhavi Devaraj
1970 ECTI Transactions on Computer and Information Technology  
The results obtained for the aspect-level computation are also compared with the corresponding results obtained from the document-level approach.  ...  Our performance evaluation results are on six different datasets of different kinds, including movie reviews, blog posts and twitter feeds.  ...  Thus in total we work on three different kinds of data items, reviews, blog posts and twitter feeds.  ... 
doi:10.37936/ecti-cit.201481.54389 fatcat:cc46cxh27bhqpp43m7qjmmqe7y
« Previous Showing results 1 — 15 out of 284 results