Filters








143,063 Hits in 7.2 sec

Dexter

Disheng Qiu, Luciano Barbosa, Xin Luna Dong, Yanyan Shen, Divesh Srivastava
2015 Proceedings of the VLDB Endowment  
In this paper, we present DEXTER, a system to find product sites on the web, and detect and extract product specifications from them.  ...  The results show that our crawler strategy to locate product specification pages is effective: (1) it discovered 1.46M product specification pages from 3, 005 sites and 9 different categories; (2) the  ...  Initially, from seed products chosen from large, popular product websites, DEX-TER locates new product websites, which contain specifications, such as shopping and company websites (Website Discovery).  ... 
doi:10.14778/2831360.2831372 fatcat:jq5ve42rzfh45m4u3rfvxre6oi

The Online Sale of Antibiotics for Veterinary Use

Juan F. Garcia, M. Jose Diez, Ana M. Sahagun, Raquel Diez, Matilde Sierra, Juan J. Garcia, M. Nelida Fernandez
2020 Animals  
Complex searches used wildcards and specific syntax.  ...  An inappropriate use of antibiotics can impair animal health and enhance the risk of bacterial resistance, as well as its transfer from animals to humans.  ...  We configured both browsers to work in full private mode, and our software employed specific parameters to prevent search engines from using country-specific website-filtering. 3. Result extraction.  ... 
doi:10.3390/ani10030503 pmid:32192151 fatcat:yy5xcv3xibgibohrbqcvvlxrzq

Extraction and Credibility Evaluation of Web-based Competitive Intelligence

Jie Zhao, Peiquan Jin
2011 Journal of Software  
However, traditional approaches focus on collecting Web pages and fail to generate practical competitive intelligence from Web pages.  ...  Aiming at solving these problems, we propose a framework in this paper for the extraction and credibility evaluation of Web competitive intelligence.  ...  and the National Science Foundation of China under the grant no. 60776801 and 70803001.  ... 
doi:10.4304/jsw.6.8.1513-1520 fatcat:3mqfgxwa5rgdbmsfp2amx5mg7y

An analysis of duplicate on web extracted objects

Stefano Ortona
2014 Proceedings of the 23rd International Conference on World Wide Web - WWW '14 Companion  
The automatic extraction of structured data from web is a challenging problem that has been widely investigated.  ...  In this paper we present web object matching, the problem of identifying duplicates among records extracted from the web.  ...  In product matching the records to match are product offers and the sources of information are the e-commerce websites where the offers come from.  ... 
doi:10.1145/2567948.2579708 dblp:conf/www/Ortona14 fatcat:5av2xcaqvjf67mfwivyidegrgm

DIADEM

Tim Furche, Georg Gottlob, Giovanni Grasso, Xiaonan Guo, Giorgio Orsi, Christian Schallhart, Cheng Wang
2014 Proceedings of the VLDB Endowment  
DIA-DEM overcomes this challenge through a self-adaptive network of relational transducers that produces effective wrappers for a wide variety of websites.  ...  Our extensive and publicly available evaluation shows that, for more than 90% of sites from three domains, DIADEM obtains an effective wrapper that extracts all relevant data with 97% average precision  ...  PROBLEM STATEMENT DIADEM fully automatically extracts data from entire websites of a target domain at scale. Specifically, it automatically produces an "effective wrapper" for a full site.  ... 
doi:10.14778/2733085.2733091 fatcat:gyigdbhyajcqpoyekhajuc6c4i

Visualization of job availability based on text analytics localization approach

Nur Azmina Mohamad Zamani, Norhaslinda Kamaruddin, Abdul Wahab, Nur Shahana Saat
2019 Indonesian Journal of Electrical Engineering and Computer Science  
It relates to the number of volumes of produced products and services. If the unemployment rate is high, the amount of gross domestic product (GDP) of a country may be declined.  ...  In this paper we proposed a text analytics technique to extract users' comments from social media such as Twitter and Facebook on job advertisement.  ...  It is a strong indicator of economic stability of a country and relates strongly to the number of volumes of produced products and services.  ... 
doi:10.11591/ijeecs.v16.i2.pp744-751 fatcat:ety7wcac6jaj3cuu6tls2gt56u

Cheers in UK: How Visible Are Spanish Sparkling Wines on Google.co.uk? [chapter]

Carlos Gonzalo-Penela, Noelia Jiménez-Asenjo, Diana A. Filipescu
2019 Promotion and Marketing Communications [Working Title]  
Extraction and also cybermetric analysis of Search Engine Result Pages (SERPs) using SEO techniques were used to calculate the visibility of Spanish cava brands via their own websites and e-commerce websites  ...  , whereupon we were able to establish rankings of media, social networks, wine sites and e-commerce websites as well as recommendations for content optimization.  ...  It would therefore be advisable to perform the data extractions from different locations in the UK and compare the results.  ... 
doi:10.5772/intechopen.89541 fatcat:ur5wqajnujdudhu3xe5clrd6pu

Wiccap Data Model: Mapping Physical Websites to Logical Views [chapter]

Zehua Liu, Feifei Li, Wee Keong Ng
2002 Lecture Notes in Computer Science  
To do this, information from the Web sources need to be extracted automatically according to users' interests.  ...  To accelerate the creation of data models, we also define a formal process for creating such data model and have implemented a software tool to facilitate and automate the process of producing Wiccap Data  ...  As a result, the data model produced is usually not intuitive to other users and is very specific to the particular website.  ... 
doi:10.1007/3-540-45816-6_19 fatcat:qc47mefiszgwvbfb3v7oqqac6e

A Reproducible IT-Blog Corpus

Adrien Barbaresi, Jens Pohlmann
2021 Journal of Open Humanities Data  
The dataset comprises text and metadata extracted from several hundred IT-blogs and websites, along with a method to duplicate the data by updating its contents and downloading it to the user's local machine  ...  The targets have been hand-picked with the intention to represent the discourse on blogs and websites dedicated to questions at the intersection of technology and society from Germany and the United States  ...  ACKNOWLEDGEMENTS We would like to thank Lukas and Yannick Kozmus (BBAW), Victoria Kratel (University of Bremen), and Uma Phatak (Stanford University) for their work on quality assurance and the collection  ... 
doi:10.5334/johd.35 fatcat:juant26ygbepfew7oht47imbku

Lessons Learned and Research Agenda for Big Data Integration of Product Specifications

Luciano Barbosa, Valter Crescenzi, Xin Luna Dong, Paolo Merialdo, Federico Piai, Disheng Qiu, Yanyan Shen, Divesh Srivastava
2018 Sistemi Evoluti per Basi di Dati  
The product domain represents a challenging scenario for developing and evaluating big data integration solutions: the number of sources providing product specifications is very large, and ever increasing  ...  In this paper, we present ongoing efforts, challenges and our research agenda to address big data integration for product specifications.  ...  Dexter starts from a seed set of product pages, and iteratively discovers and crawls new sources, from which extracts products specifications.  ... 
dblp:conf/sebd/BarbosaCDMPQSS18 fatcat:4mbfuedg3veotppqq6olxxxyai

xCrawl: A High-Recall Crawling Method for Web Mining

Kostyantyn Shchekotykhin, Dietmar Jannach, Gerhard Friedrich
2008 2008 Eighth IEEE International Conference on Data Mining  
The proposed crawling technique was inspired by the requirements of a Web Mining System developed to extract product and service descriptions and was evaluated in different application scenarios.  ...  Web Mining Systems exploit the redundancy of data published on the Web to automatically extract information from existing web documents.  ...  The research project is funded partly by grants from the Austrian Research Promotion Agency (Program Line FIT-IT Semantic Systems Project AllRight, Contract 809261) and by grants of the Austrian Science  ... 
doi:10.1109/icdm.2008.121 dblp:conf/icdm/ShchekotykhinJF08 fatcat:5ycai5hwonasrlpfnvnplz7rbe

xCrawl: a high-recall crawling method for Web mining

Kostyantyn Shchekotykhin, Dietmar Jannach, Gerhard Friedrich
2009 Knowledge and Information Systems  
The proposed crawling technique was inspired by the requirements of a Web Mining System developed to extract product and service descriptions and was evaluated in different application scenarios.  ...  Web Mining Systems exploit the redundancy of data published on the Web to automatically extract information from existing web documents.  ...  The research project is funded partly by grants from the Austrian Research Promotion Agency (Program Line FIT-IT Semantic Systems Project AllRight, Contract 809261) and by grants of the Austrian Science  ... 
doi:10.1007/s10115-009-0266-3 fatcat:5pyaebbsifbbvbodmvknoeqx6e

Matching Physical Sites with Web Sites for Semantic Localization

Rufeng Meng, Sheng Shen, Romit Roy Choudhury, Srihari Nelakuditi
2015 Proceedings of the 2nd workshop on Workshop on Physical Analytics - WPA '15  
By correlating words inside the pictures, against words extracted from store websites, our proposed system can automatically label clusters of pictures, and the corresponding WiFi APs, with the store name  ...  Specifically, we assume a repository of crowdsourced WiFi-tagged pictures from different stores.  ...  AutoLabel approach is more scalable as it uses the correlation between in-store words and website words to produce a WiFiAP-StoreName table.  ... 
doi:10.1145/2753497.2753501 dblp:conf/mobisys/MengSCN15 fatcat:qzn3ywvxazdnpbgd2tcifngtwi

Assessing the quality of information provided on websites selling Kratom (Mitragyna speciosa) to consumers in Canada

Jeremy Y. Ng, Muhammad Ans, Amn Marwaha
2021 Substance Abuse Treatment, Prevention, and Policy  
Searches were conducted on March 27, 2020 and only websites presenting information in English were included.  ...  Results A total of 200 webpages were identified; after screening based on eligibility criteria and combining different webpages that belonged to the same website, 51 websites were found to be eligible.  ...  MA: collected the data, interpreted and analysed the data, provided contributions and critically revised the manuscript, and gave final approval of the version to be published.  ... 
doi:10.1186/s13011-021-00361-2 pmid:33741009 pmcid:PMC7977165 fatcat:gzfmt7spkrbstjtjyemzrvhcoi

Video Ads in Digital Marketing and Sales: A Big Data Analytics Using Scrapy Web Crawler Mining Technique

Addo Prince Clement, Dorgbefu Jnr. Maxwell, Kulbo Nora Bakabbey, Akpatsa Samuel Kofi, Ohemeng Asare Andy, Dagadu Joshua Caleb, Boansi Kufuor Oliver, Kofi Frimpong Adasa Nkrumah
2021 Asian Journal of Research in Computer Science  
A total of 23589 datasets were drawn from three global B2C and C2C websites using the scrappy web crawlers to investigate a resilience model in the relationship between SV advertising adoption, quality  ...  The survival of the global economy is rooted in the production of goods, rendering of valuable services, and formulation and implementation of favorable trade policies.  ...  SV, moderated by price and after controlling for LSQ, location, and product category, produced a significant direct effect of 0.725 to support H1.  ... 
doi:10.9734/ajrcos/2021/v11i430270 fatcat:f74yuvm2vffpreojcqf7d423bu
« Previous Showing results 1 — 15 out of 143,063 results