Filters








8,083 Hits in 5.3 sec

An automated approach for retrieving hierarchical data from HTML tables

Seung-Jin Lim, Yiu-Kai Ng
1999 Proceedings of the eighth international conference on Information and knowledge management - CIKM '99  
This relaxation complicates the process of retrieving hierarchical data from HTML tables. In this paper, we propose an automated approach for retrieving hierarchical data from HTML tables.  ...  Our approach can be employed by (i) a query language written for retrieving hierarchically structured data, extracted from either the contents of HTML tables or other sources, (ii) a processor for converting  ...  Implementation of the approach The proposed approach for retrieving hierarchical data from HTML tables has been implemented as a Java class and tested on a Pent&m-based workstation using the JDK 1.1.7.  ... 
doi:10.1145/319950.320052 dblp:conf/cikm/LimN99 fatcat:zvqyq4yd5valxnoguemnovqrou

An XML-enabled data extraction toolkit for web sources

Ling Liu, Calton Pu, Wei Han
2001 Information Systems  
Hence, the web users or applications need a smart way of extracting data from these web sources.  ...  The amount of useful semi-structured data on the web continues to grow at a stunning pace. Often interesting web data are not in database systems but in HTML pages, XML pages, or text files.  ...  Acknowledgements We would like to thank the XWRAP team at Georgia Tech for their implementation effort.  ... 
doi:10.1016/s0306-4379(01)00040-0 fatcat:5dxghttqgzgnjlu2dcy2bkprxa

A language independent web data extraction using vision based page segmentation algorithm [article]

P YesuRaju, P KiranSree
2013 arXiv   pre-print
This approach primary utilizes the visual features on the webpage to implement web data extraction.  ...  But this is tedious and time consuming as well as difficult when the data to be retrieved is plenty.  ...  For instance, consider this table, taken from an HTML document.  ... 
arXiv:1310.6637v1 fatcat:dkx6sypr7rgvboh3x3q2uacoha

A LANGUAGE INDEPENDENT WEB DATA EXTRACTION USING VISION BASED PAGE SEGMENTATION ALGORITHM

P Yesuraju .
2013 International Journal of Research in Engineering and Technology  
This approach primary utilizes the visual features on the webpage to implement web data extraction.  ...  But this is tedious and time consuming as well as difficult when the data to be retrieved is plenty.  ...  For instance, consider this table, taken from an HTML document.  ... 
doi:10.15623/ijret.2013.0204040 fatcat:aiq2wxjklncdbmyiegsokilduu

Classified Ads Harvesting Agent and Notification System [article]

Razvi Doomun, Lollmahamod N., Auleear Nadeem, Mozafar Aukin
2010 arXiv   pre-print
Information extraction agents are used to explore and collect data available from Web, in order to effectively exploit such data for business purposes, such as automatic news filtering, advertisement or  ...  The shift from an information society to a knowledge society require rapid information harvesting, reliable search and instantaneous on demand delivery.  ...  OBJECT MODEL AND EXTRACTION An object model approach is used to extract information from HTML pages.  ... 
arXiv:1003.2677v1 fatcat:5t4wlclzmnhfjjumvpzmo7c44i

Effective Web data extraction with standard XML technologies

Jussi Myllymaki
2001 Proceedings of the tenth international conference on World Wide Web - WWW '01  
An ideal data extraction process is able to digest target Web databases that are visible only as HTML pages, and create a local, identical replica of those databases as a result.  ...  We discuss the problem of Web data extraction and describe an XML-based methodology whose goal extends far beyond simple "screen scraping."  ...  An automated crawler is used to retrieve target pages from a Web site.  ... 
doi:10.1145/371920.372183 dblp:conf/www/Myllymaki01 fatcat:rcdwcekjpze47cldjr23amznsy

Effective Web data extraction with standard XML technologies

Jussi Myllymaki
2002 Computer Networks  
An ideal data extraction process is able to digest target Web databases that are visible only as HTML pages, and create a local, identical replica of those databases as a result.  ...  We discuss the problem of Web data extraction and describe an XML-based methodology whose goal extends far beyond simple "screen scraping."  ...  would like to thank Jared Jackson and Stephen Dill of IBM Almaden Research Center, Yan Zhou of IBM China Development Laboratory, and Dorine Yelton, John Rees, and Douglas Griswold of IBM Global Services, for  ... 
doi:10.1016/s1389-1286(02)00214-1 fatcat:wb6x6erukbeqpkhbpsi6tsv6aq

An Automated Web Application Testing System

Moheb R.Girgis, Tarek M. Mahmoud, Bahgat A. Abdullatif, Alaa M. Zaki
2014 International Journal of Computer Applications  
This paper presents a proposed Web testing approach, in which hyperlinks of the website to be tested are automatically followed one by one to retrieve all HTML texts of its pages starting from the home  ...  The paper also describes an automated Web application testing system that has been developed to implement the proposed approach.  ...  [16] have proposed an activity oriented technique for automated test code generation.  ... 
doi:10.5120/17387-7926 fatcat:fq7p25jnsbgipc62mvb4qkka6y

Gathering meta‐data and instances from object referral lists on the web

Srinivas Vadrevu, Fatih Gelgi, Saravanakumar Nagarajan, Hasan Davulcu, Miguel‐Angel Sicilia
2006 Online information review (Print)  
hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta-data and their individual data  ...  Design/methodology/approach -Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a  ...  In the following sections, we describe how to retrieve and organize the value types, and provide an algorithm that can infer meta-data and their instances from an ORL.  ... 
doi:10.1108/14684520610675807 fatcat:tldxs7lirnafxj53mwrk6wuqb4

An Indexing, Browsing, Search and Retrieval System for Audiovisual Libraries [chapter]

Jane Hunter, Jan Newmarch
1999 Lecture Notes in Computer Science  
It is a Java application which integrates a video replay window with vcr-type controls and metadata input forms generated from an hierarchical RDF schema.  ...  This paper describes an application which enables the computer-assisted generation of Dublin Core-based metadata descriptions and online digital visual summaries for videos.  ...  We also wish to thank the staff from the State Library of Queensland Audiovisual unit for their assistance.  ... 
doi:10.1007/3-540-48155-9_7 fatcat:kamilhgqdbajbesykru6freaxm

Detecting similar HTML documents using a fuzzy set information retrieval approach

Rajiv Yerra, Yiu-Kai Ng
2005 2005 IEEE International Conference on Granular Computing  
In this paper, we present a new approach for detecting similar Web documents, especially HTML documents.  ...  approach, and (iii) matching the corresponding hierarchical content of the two documents using a simple tree matching algorithm.  ...  To create the semantic hierarchy portion for HTML table data, if they exist, the hierarchical dependencies (e.g., row and column order) among the data content in the table are determined using various  ... 
doi:10.1109/grc.2005.1547380 dblp:conf/grc/YerraN05 fatcat:xlng4altbzgzdj3suvpdavx2ou

A Survey of Ontology Learning Approaches

Maryam Hazman, Samhaa R. El-Beltagy, Ahmed Rafea
2011 International Journal of Computer Applications  
In this paper, we present a survey for the different approaches in ontology learning from semi-structured and unstructured date General Terms Ontology learning approaches.  ...  So many research developed several ontology learning approaches and systems.  ...  The first approach utilizes the structure of phrases appearing in the documents" HTML headings while the second utilizes the hierarchical structure of Data Mining Approach Karoui et. al.  ... 
doi:10.5120/2610-3642 fatcat:fhtn24qoq5ewhiu6a5w5hyysee

Towards ontology-based semantic web from data-intensive web: A reverse engineering approach

S.M. Benslimane, M. Malki, A. Lehirech
2006 IEEE International Conference on Computer Systems and Applications, 2006.  
In this context we try to propose a novel and integrated approach for migrating data-intensive web into ontology-based semantic web and thus, make the web content machine-understandable.  ...  Our approach is based on the idea that semantics can be extracted from the structures and the instances of HTML forms which are the most convenient interface to communicate with relational databases on  ...  Figure 11 illustrates an example result of the data migration process from the Table 3 .  ... 
doi:10.1109/aiccsa.2006.205177 dblp:conf/aiccsa/BenslimaneML06 fatcat:zugmnlzsavfcbg6iomnqtbn2re

OntoMiner: automated metadata and instance mining from news websites

Hasan Davulcu, Srinivas Vadrevu, Saravanakumar Nagarajan
2005 International Journal of Web and Grid Services  
., Vadrevu, S. and Nagarajan, S. (2005) 'OntoMiner: automated metadata and instance mining from news websites', Int.  ...  RDF/XML has been widely recognised as the standard for annotating online web documents and for transforming the HTML web into the so-called Semantic Web.  ...  In order to effectively retrieve the attribute labels and their values from such tables, an algorithm to accurately detect and extract data from tables is required.  ... 
doi:10.1504/ijwgs.2005.008320 fatcat:4ioxpkfl7bhbff2jors7xugoyu

A Model for Enhancing Internet Medical Document Retrieval with "Medical Core Metadata"

G. Malet, F. Munoz, R. Appleyard, W. Hersh
1999 JAMIA Journal of the American Medical Informatics Association  
The wealth of resources available on the Internet has stimulated information scientists to consider new models for knowledge retrieval.  ...  Authors could offer intuitive connections from their documents to remote sites and place their documents in the context of existing literature.  ...  Harvest (http://harvest.transarc.com/afs/ transarc.com/public/trg/Harvest/) is an example of a set of tools that allow data to be extracted in customized ways from remote resources to permit construction  ... 
doi:10.1136/jamia.1999.0060163 pmid:10094069 pmcid:PMC61355 fatcat:5evnninqvzgt3mw3mcsrfqdnii
« Previous Showing results 1 — 15 out of 8,083 results