A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Zero-cost labelling with web feeds for weblog data extraction
2013
Proceedings of the 22nd International Conference on World Wide Web - WWW '13 Companion
In this paper, we propose a fully automated approach in generating a wrapper for weblogs, which exploits web feeds for cheap labelling of weblog properties. ...
Data extraction from web pages often involves either human intervention for training a wrapper or a reduced level of granularity in the information acquired. ...
CONCLUSIONS We have presented a method for fully automated weblog wrapper generation. Based on the weblogs' feeds, our model realises an effective and zero-cost labelling technique. ...
doi:10.1145/2487788.2487819
dblp:conf/www/GkotsisSCJ13
fatcat:2yienhy72reshderi6ccs4fk2e
A Scalable Approach to Harvest Modern Weblogs
2015
International journal on artificial intelligence tools
To achieve this goal, we introduce a simple yet robust and scalable algorithm to generate extraction rules based on string matching using the blog's web feed in conjunction with blog hypertext. ...
Blogs are one of the most prominent means of communication on the web. ...
Gkotsis from the University of Warwick for generously sharing his research material, time, and ideas with us. ...
doi:10.1142/s0218213015400059
fatcat:mggtuhzpzzfz5gsd3gydj4dpoi
Deriving marketing intelligence from online discussion
2005
Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05
Given the volume, format and content of the data, the appropriate approach to understand this data is to use large-scale web and text data mining technologies. ...
Weblogs and message boards provide online forums for discussion that record the voice of the public. ...
For such weblogs, we automatically use the feed to extract new posts with near 100% accuracy instead of crawling and segmenting the weblog. ...
doi:10.1145/1081870.1081919
dblp:conf/kdd/GlanceHNSST05
fatcat:3l3obqej4remtoxmbicba5txfa
Weblog Clustering in Multilinear Algebra Perspective
[article]
2009
arXiv
pre-print
Then, 3-way adjacency tensor is extracted from the network and the PARAFAC decomposition is applied to the tensor to get pairs of node lists and label lists with scores attached to each list as the indication ...
The proposed method first creates labeled-link network representation of the weblog datasets, where the nodes are the blogs and the labels are the shared words. ...
Our codes for download and data preparation steps are written in python by using Universal Feed Parser module 6 for parse() function (see Algorithm 2), and for decomposition step are written in MATLAB ...
arXiv:0909.2345v1
fatcat:72vjeerpxfdbtfx4agkhixufau
Survey Paper On Techniques Used In Opinion Mining
2017
International Journal of Recent Trends in Engineering and Research
In this, the various methods of mining in multiple ways such as web and data mining are used to retrieve data from web sites and to optimize we need to go through the queries of data mining. ...
Here, the data mining concepts are used which mainly deals with mining the UGC from different e-commerce websites which are being used in our daily routine life and after we extract the required UGC we ...
Nidhi Sharma for her guidance and support during project development. Also, we are thankful to BVCOE principal and all staff for their good support. ...
doi:10.23883/ijrter.2017.2997.xegno
fatcat:75xnihch6rcuda4ddjfxsalame
Improving Specimen Labelling and Data Collection in Bio-science Research using Mobile and Web Applications
2020
Open Computer Science
We present WebLog, an application that we prototyped to aid researchers generate specimen labels and collect data from experiment sites. ...
Once a specimen label is successfully scanned, the application automatically invokes the data entry form. The collected data is immediately sent to the server in electronic form for analysis. ...
Acknowledgement: We thank the anonymous referees for their insightful comments. We also thank the Kenya Education Network (KENET) for providing the hosting services for the prototype application. ...
doi:10.1515/comp-2020-0002
fatcat:zfhizqpj4vei3eh4lj3j7ftbjq
The Pulse of News in Social Media: Forecasting Popularity
[article]
2012
arXiv
pre-print
Prior research has dealt with predicting eventual online popularity based on early popularity. ...
Hence, the task of predicting the popularity of news items on the social web is both interesting and challenging. ...
Category Score News feeds provided by Feedzilla are pre-tagged with category labels describing the content. ...
arXiv:1202.0332v1
fatcat:hljai7wkffck7jvmymsjjdtiqi
The Information Ecology of Social Media and Online Communities
2008
The AI Magazine
One thing that sets these "Web 2.0" sites apart from traditional Web pages and resources is that they are intertwined with other forms of networked data. ...
Social media systems such as weblogs, photo- and link-sharing sites, Wikis and on-line forums are currently thought to produce up to one third of new Web content. ...
James Mayfield, Justin Martineau and Sandeep Balijepalli for their contributions to the work on sentiment detection. ...
doi:10.1609/aimag.v29i3.2158
fatcat:qmuzv4t7gba5vmaic3nzzfldna
Facilitating SQL Query Composition and Analysis
[article]
2020
arXiv
pre-print
Formulating efficient SQL queries requires several cycles of tuning and execution, particularly for inexperienced users. ...
This is particularly important in settings with limited access to the database instance. ...
., cloud-based data warehouses like Google BigQuery [16] , databases on the hidden web, sources located behind wrappers in data integration systems [6] , and instances with limited access due to cost ...
arXiv:2002.09091v1
fatcat:bvvvrugpbvh2tmmohz32anf47m
Improving Schema Matching with Linked Data
[article]
2012
arXiv
pre-print
With today's public data sets containing billions of data items, more and more companies are looking to integrate external data with their traditional enterprise data to improve business intelligence analysis ...
First experiments show that using Linked Data to map cell values with instances and column headers with types improves significantly the quality of the matching results and therefore should lead to more ...
These approaches are however restricting label candidates to Web content from which the data was extracted. ...
arXiv:1205.2691v2
fatcat:lqjtwtvngnd3rebaeixfl647ai
Text and Structural Data Mining of Influenza Mentions in Web and Social Media
2010
International Journal of Environmental Research and Public Health
Text and structural data mining of web and social media (WSM) provides a novel disease surveillance resource and can identify online communities for targeted public health communications (PHC) to assure ...
Link analysis reveals communities for targeted PHC. Text mining is shown to identify trends in flu posts that correlate to real-world influenza-like illness patient report data. ...
PNNL is operated by Battelle Memorial Institute for the U.S. Department of Energy under contract DE-AC05-76RL01830. ...
doi:10.3390/ijerph7020596
pmid:20616993
pmcid:PMC2872292
fatcat:qsqq3g2htfdifizd4b7evmjdye
With today's public data sets containing billions of data items, more and more companies are looking to integrate external data with their traditional enterprise data to improve business intelligence analysis ...
First experiments show that using Linked Data to map cell values with instances and column headers with types improves significantly the quality of the matching results and therefore should lead to more ...
These approaches are however restricting label candidates to Web content from which the data was extracted. ...
doi:10.1145/2422604.2422607
dblp:conf/wod/AssafLSFTT12
fatcat:r4ezamwdsvhpfeqg5qreaolzam
Information diffusion through blogspace
2004
Proceedings of the 13th conference on World Wide Web - WWW '04
We study the dynamics of information propagation in environments of low-overhead personal publishing, using a large collection of weblogs over time as our example domain. ...
The set of nodes are connected in a directed graph with each edge (u, v) labeled with a probability pu,v. ...
All features extracted using any of these methods are then spotted wherever they occur in the corpus, and extracted with metadata indicating the date and blog of occurrence. ...
doi:10.1145/988672.988739
dblp:conf/www/GruhlGLT04
fatcat:pichvz4ntzggxm5wopjtiqgz2m
Information diffusion through blogspace
2004
SIGKDD Explorations
We study the dynamics of information propagation in environments of low-overhead personal publishing, using a large collection of weblogs over time as our example domain. ...
The set of nodes are connected in a directed graph with each edge (u, v) labeled with a probability p u,v . ...
All features extracted using any of these methods are then spotted wherever they occur in the corpus, and extracted with metadata indicating the date and blog of occurrence. ...
doi:10.1145/1046456.1046462
fatcat:yasoh2l2yngfvip5yj746bmjri
Computing Sentiment Polarity of Texts at Document and Aspect Levels
1970
ECTI Transactions on Computer and Information Technology
The results obtained for the aspect-level computation are also compared with the corresponding results obtained from the document-level approach. ...
Our performance evaluation results are on six different datasets of different kinds, including movie reviews, blog posts and twitter feeds. ...
Thus in total we work on three different kinds of data items, reviews, blog posts and twitter feeds. ...
doi:10.37936/ecti-cit.201481.54389
fatcat:cc46cxh27bhqpp43m7qjmmqe7y
« Previous
Showing results 1 — 15 out of 284 results