5,813 Hits in 14.7 sec

Rule Learning for Feature Values Extraction from HTML Product Information Sheets [chapter]

Costin Bădică, Amelia Bădică
2004 Lecture Notes in Computer Science  
The paper presents a technique for learning extraction rules of product information from such product information sheets.  ...  In this paper we assume that product information is represented as a set of feature-value pairs contained in an HTML product information sheet that is usually formatted using HTML tables.  ...  Experimental Results We ran a series of experiments of rule learning for IE from the Hewlett Packard's Web site. The task was to extract the printers feature values from their information sheets.  ... 
doi:10.1007/978-3-540-30504-0_4 fatcat:pxkbj6v4ivdk3alhgnb7ty3rim

Tuples Extraction from HTML Using Logic Wrappers and Inductive Logic Programming [chapter]

Costin Bădică, Amelia Bădică, Elvira Popescu
2005 Lecture Notes in Computer Science  
This paper presents an approach for applying inductive logic programming to information extraction from HTML documents structured as unranked ordered trees.  ...  We consider information extraction from Web resources that are abstracted as providing sets of tuples.  ...  Tuples Extraction from Flat Information Resources We performed an experiment of learning to extract the tuples containing the feature name and feature value from HP printer information sheets.  ... 
doi:10.1007/11495772_8 fatcat:ydaystgponbpbi6cmyd5addvsq

L-wrappers: concepts, properties and construction

Costin Bădică, Amelia Bădică, Elvira Popescu, Ajith Abraham
2006 Soft Computing - A Fusion of Foundations, Methodologies and Applications  
We also define a convenient way for mapping L-wrappers to XSLT for efficient processing using available XSLT processing engines.  ...  The developed Logic wrappers (L-wrapper) have declarative semantics, and therefore: (i) their specification is decoupled from their implementation and (ii) they can be generated using inductive logic programming  ...  Consider the Hewlett Packard's site of electronic products 3 and the task of IE from a product information sheet for printers.  ... 
doi:10.1007/s00500-006-0118-y fatcat:csllarzbsrbqbfj6jxwvf7dmru

Concept extraction for online shopping

Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman
2012 Proceedings of the 14th Annual International Conference on Electronic Commerce - ICEC '12  
ACE is an unsupervised method that looks at both text and HTML tags. We upgrade ACE into Improved Concept Extractor (ICE) with significant improvements. KEA is a supervised learning system.  ...  Concept extraction is a nice solution for this purpose. In this paper, we investigate two concept extraction methods: Automatic Concept Extractor (ACE) and Automatic Keyphrase Extraction (KEA).  ...  For each candidate, two feature values, TFIDF and first occurrence, are calculated.  ... 
doi:10.1145/2346536.2346545 dblp:conf/ACMicec/ZhangMS12 fatcat:jsdobnzyqfcijmxkkjw25yywzm

Logic Wrappers and XSLT Transformations for Tuples Extraction from HTML [chapter]

Costin Bădică, Amelia Bădică
2005 Lecture Notes in Computer Science  
The mapping actually shows how the theory can be applied to obtain efficient wrappers for information extraction from HTML.  ...  Recently it was shown that existing general-purpose inductive logic programming systems are useful for learning wrappers (known as L-wrappers) to extract data from HTML documents.  ...  Rule bodies check various token features like: length, position in the text fragment, if they are numeric or capitalized, a.o. SRV has been adapted to learn information extraction rules from HTML.  ... 
doi:10.1007/11547273_13 fatcat:r6gwkn2fqvgjff5efyma2bovue

Extracting informative textual parts from web pages containing user-generated content

Nikolaos Pappas, Georgios Katsimpras, Efstathios Stamatatos
2012 Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies - i-KNOW '12  
Based on a human annotated corpus consisting of diverse topics, domains and templates, we demonstrate the learning abilities of our algorithm, we examine its e↵ectiveness in extracting the informative  ...  textual parts and its usage as a rule-based classifier for web page type detection in a realistic web setting.  ...  The authors thank Andrei Popescu-Belis and Thomas Meyer for their helpful remarks.  ... 
doi:10.1145/2362456.2362462 dblp:conf/iknow/PappasKS12 fatcat:rl4rfjtfc5c3ddxp3mhnalntzi

Reliable Architecture for Power System operational Communications (Integration of Digital PLC & ATM)

Alessandro Garibbo, Laura Petrolino, Giancarlo Caroti, Francesco Rufino
2006 2006 International Multi-Conference on Computing in the Global Information Technology - (ICCGI'06)  
The approach was successfully applied in various application areas: collecting product features from product information sheets and mining travel resources as found on Web sites of online transaction brokers  ...  Logic wrappers are a new technology that was proposed to help automatizing the task of data extraction from the Web.  ...  Examples include: search engines result pages, product catalogues, news sites, product information sheets, travel resources, multimedia repositories, Web directories, a.o.  ... 
doi:10.1109/iccgi.2006.58 fatcat:glasdbcdkvemtav7d275hv6fru

Comparative Mining of B2C Web Sites by Discovering Web Database Schemas

C. I. Ezeife, Bindu Peravali
2016 Proceedings of the 20th International Database Engineering & Applications Symposium on - IDEAS '16  
Cascading style sheets: The cascading style sheets helps the web developers to layout the information on the web page.  ...  The feaures of Elements and Blocks in HCRFS are for each element , the algorithm extracts both vision and content features, all the information can be obtained form the vision tree features.  ...  Now this html code can be given as input to our system to extract product data. Some of the well-known embedded browser components are "web developer toolbar", "firebug" etc., 5.  ... 
doi:10.1145/2938503.2938522 dblp:conf/ideas/EzeifeP16 fatcat:muvjfqatuzf4pgu5hq3tueaima

FlexFashion: E-Commerce with Advance Features

Sanjograj Singh Ahuja
2021 International Journal for Research in Applied Science and Engineering Technology  
Abstract: The aim of this website is to enhance the shopping experience for customers using an advance feature of recommending matching outfits using the colorgram module.  ...  The e - commerce platform displays an order cut-off time and a delivery window for the products selected by the consumer.  ...  This module was tried with different colors for extracting colour information from images.  ... 
doi:10.22214/ijraset.2021.39113 fatcat:miesi2amhvg2tgaivzsb4zmdfq

A tree-based learning approach for document structure analysis and its application to web search

2014 Natural Language Engineering  
For comparison, a baseline rule-based approach was used that relies on heuristics and HTML document object model tree processing.  ...  The machine learning approach, which is a fully automatic approach, outperformed the rule-based approach.  ...  Table 9 shows the results for the feature set Γ 4 and a beam width of 100 for both rule-based and machine learning models.  ... 
doi:10.1017/s1351324914000023 fatcat:rgtaa2wyyzfcxfaxhn5w7bxjrq

Towards domain-independent information extraction from web tables

Wolfgang Gatterbauer, Paul Bohunsky, Marcus Herzog, Bernhard Krüpl, Bernhard Pollak
2007 Proceedings of the 16th international conference on World Wide Web - WWW '07  
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of tags.  ...  In this paper, we approach the problem of domain-independent information extraction from web tables by shifting our attention from the tree-based representation of web pages to a variation of the two-dimensional  ...  reviewers for helpful comments.  ... 
doi:10.1145/1242572.1242583 dblp:conf/www/GatterbauerBHKP07 fatcat:b5w5kruhtne6lhzlnx3xlsydf4

Cascading tree sheets and recombinant HTML

Edward O. Benson, David R. Karger
2013 Proceedings of the 22nd international conference on World Wide Web - WWW '13  
This paper presents Cascading Tree Sheets (CTS), a CSS-like language for separating this presentational HTML from real content.  ...  Cascading Style Sheets (CSS) took a valuable step towards separating web content from presentation.  ...  ACKNOWLEDGEMENTS The authors thank Sarah Scodel for her help creating the widgets used in this work.  ... 
doi:10.1145/2488388.2488399 dblp:conf/www/BensonK13 fatcat:ilqntsjbnzbaxiriquvlrzjpay

Uncertainty Issues and Algorithms in Automating Process Connecting Web and User [chapter]

Alan Eckhardt, Tomáš Horváth, Dušan Maruščák, Róbert Novotný, Peter Vojtáš
2008 Lecture Notes in Computer Science  
Based on an experimental system we identify uncertainty issues which make this process difficult for automated processing.  ...  We conclude with a discussion of possible future development heading to an extension of web modeling standards with uncertainty features.  ...  Attribute Values Extraction As we have mentioned before, we use an ontology to extract the actual attribute values of product in the page.  ... 
doi:10.1007/978-3-540-89765-1_13 fatcat:houmg2hzpzcwroq4khmyurhj2a

Web data extraction, applications and techniques: A survey

Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, Robert Baumgartner
2014 Knowledge-Based Systems  
Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction.  ...  Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains.  ...  allows to extract and store the data from a Web site as RDF.  ... 
doi:10.1016/j.knosys.2014.07.007 fatcat:cb6zazpx7nfgxkmkiuoxqx5zyq

Entropy-based automated wrapper generation for weblog data extraction

George Gkotsis, Karen Stepanyan, Alexandra I. Cristea, Mike Joy
2013 World wide web (Bussum)  
This paper proposes a fully automated information extraction methodology for weblogs.  ...  Instead of performing a pairwise comparison between posts, the model matches the values of the web feeds against their corresponding HTML elements retrieved from multiple weblog posts.  ...  extracting information from a large number of weblogs.  ... 
doi:10.1007/s11280-013-0269-6 fatcat:njlfs2rgcvc5fks6inoy7uvpcu
« Previous Showing results 1 — 15 out of 5,813 results