Filters








38,009 Hits in 7.0 sec

Automatic Annotation of Data Extracted From Large Web Sites

Ramesh Eluri, Meda Srikanth
2020 Figshare  
An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database.  ...  An increasing number of databases have become web accessible through HTML form-based search interfaces.  ...  However, they suffer from poor scalability and are not suitable for applications that need to extract information from a large number of web sources.  ... 
doi:10.6084/m9.figshare.12236702 fatcat:npw6oioi75edziwuqtujbp6xk4

Automatic annotation of data extracted from large Web sites

Luigi Arlotta, Valter Crescenzi, Giansalvatore Mecca, Paolo Merialdo
2003 International Workshop on the Web and Databases  
Data extraction from web pages is performed by software modules called wrappers. Recently, some systems for the automatic generation of wrappers have been proposed in the literature.  ...  In the framework of our ongoing project RoadRunner, we have developed a prototype, called Labeller, that automatically annotates data extracted by automatically generated wrappers.  ...  This paper reports our recent researches for automatically annotating data extracted from data-intensive web sites.  ... 
dblp:conf/webdb/ArlottaCMM03 fatcat:ft35urjupjdf5kxlt2qie74434

Survey of Web Database Clustering Techniques

2016 International Journal of Science and Research (IJSR)  
Extracting structured data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages.  ...  For the encoded data units to be machine processable , which is essential for many applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and assigned  ...  , Vision-based Data Extractor (ViDE), to extract structured results from deep Web pages automatically.  ... 
doi:10.21275/v5i2.nov161214 fatcat:byuitsutfndi7enfo27tweiy4e

Annotation for Query Result Records Based on Domain-Specific Ontology

S. Lakshmana Pandian, R. Punitha
2014 International Journal on Natural Language Computing  
The World Wide Web is enriched with a large collection of data, scattered in deep web databases and web pages in unstructured or semi structured formats.  ...  In the past few years researchers depicted on the automatic web data extraction methods based on similarity measures.  ...  Presently, Semantic annotation bases on correct extraction of query results. Now automatic web data extraction has been relatively matured.  ... 
doi:10.5121/ijnlc.2014.3309 fatcat:zmkjbxu7gbfrfav55txpxjiqkq

WEB SCALE INFORMATION EXTRACTION USING WRAPPER INDUCTION APPROACH

RINA ZAMBAD, JAYANT GADGE
2014 International Journal of Electronics and Electical Engineering  
The source of data will be collected from various post website. The obtained post data pages are processed by page parsing, cleansing and data extraction to obtain new reference sets.  ...  Information extraction from unstructured, ungrammatical data such as classified listings is difficult because traditional structural and grammatical extraction methods do not apply.  ...  uses them for extraction and annotation without training data.  An automatic method for constructing reference sets from the posts themselves.  An automatic method for web post record extraction using  ... 
doi:10.47893/ijeee.2014.1121 fatcat:jh5qa2w3offqrcnkonkke7mwcm

A Survey on Data Annotation for the Web Databases

Miss.Priyanka P.Boraste
2014 IOSR Journal of Computer Engineering  
To reduce human efforts a multi-annotator approach is proposed to automatically extract data units and assign labels.  ...  to annotate new result records from the same web database.  ...  Acknowledgement I feel great pleasure in submitting this paper "A SURVEY ON DATA ANNOTATION FOR THE WEB DATABASES" ". I wish to Thank IOSR Journals for giving us such a wonderful opportunity.  ... 
doi:10.9790/0661-162116870 fatcat:3zyoclrc2batzfnda6lesqp4tu

Learning to Harvest Information for the Semantic Web [chapter]

Fabio Ciravegna, Sam Chapman, Alexiei Dingli, Yorick Wilks
2004 Lecture Notes in Computer Science  
In this paper we describe a methodology for harvesting information from large distributed repositories (e.g. large Web sites) with minimum user intervention.  ...  Learning is seeded by extracting information from structured sources (e.g. databases and digital libraries) or a user-defined lexicon.  ...  AKT project (www.aktors.org), sponsored by the UK Engineering and Physical Sciences Research Council (grant GR/N15764/01), and the Dot.Kom project (www.dot-kom.org), sponsored by the EU IST asp part of  ... 
doi:10.1007/978-3-540-25956-5_22 fatcat:c3vg7haylrhshmm2xmuubhhmkm

Integrating Information to Bootstrap Information Extraction from Web Sites

Fabio Ciravegna, Alexiei Dingli, David Guthrie, Yorick Wilks
2003 International Joint Conference on Artificial Intelligence  
We are currently applying this methodology to mining web sites of Computer Science departments.  ...  In this paper we propose a methodology to learn to extract domain-specific information from large repositories (e.g. the Web) with minimum user intervention.  ...  Objectives are to develop advanced technologies for knowledge management and the Semantic Web.  ... 
dblp:conf/ijcai/CiravegnaDGW03 fatcat:udyuo5jzhjbtni3thj6xgzzime

An Automatic Annotation Technique for Web Search Results

Rosamma KS, Jiby J Puthiyidam
2015 International Journal of Computer Applications  
The annotation wrapper generated for the search site is automatically constructed and can be used to annotate new result pages from the same web database.  ...  The manual methods for record extraction and labeling have a worse scalability. Thus automatic annotation based method is needed to improve the accuracy as well as scalability of web search engines.  ...  INTRODUCTION Web information extraction and annotation are two important research areas in recent years. Large portion of the deep web is database based.  ... 
doi:10.5120/21383-4375 fatcat:7hbigf53sndlddkotjgfyqxzba

Data Extraction and Annotation for Web Databases using Multiple Annotators Approach - A Review

Yogesh W.Wanjari, Dipali B. Gaikwad, Vivek D. Mohod, Sachin N. Deshmukh
2014 International Journal of Computer Applications  
Web contain huge amount of information on Web sites the user can retrieve this with help of the search input query to Web databases & fetch the relevant information.  ...  Perhaps Web databases return the multiple search output records dynamically on Web browser, these search record are containing the Deep Web pages in the form of HTML pages.  ...  Even mobile technology focus on the various trends in web. There are various technologies & researches are focusing on the extraction of relevant information from large web data storage.  ... 
doi:10.5120/15454-3994 fatcat:5yzpgfuhdjg7pgr2dmrdnit2yi

FOCIH: Form-Based Ontology Creation and Information Harvesting [chapter]

Cui Tao, David W. Embley, Stephen W. Liddle
2009 Lecture Notes in Computer Science  
Keywords: ontology generation from forms, information harvesting from the web, automatic annotation of web pages, web of data, Web 3.0. C. Tao, et al.  ...  our prototype system show that automatic harvesting, automatic annotation, and automatic superimposition of a web of data over a web of pages work well.  ...  the semi-automatic construction of a web of data.  ... 
doi:10.1007/978-3-642-04840-1_26 fatcat:zi35uea7h5hg3cqerzsohxzo6e

Automatic Acquisition and Semantic Annotation of Web Tourism Information

Hui PENG, Wen-qi QU
2019 DEStech Transactions on Computer Science and Engineering  
A method which collects data from tourism web site and annotates these data with semantic tags automatically is promoted in this paper.  ...  The crawler which collects data from web site automatically is introduced firstly. Then the Chinese word segmentation tool and a classic key word extraction algorithm TF/IDF are introduced.  ...  Text Content Processing Tourism information obtained from web pages is often large blocks of text.  ... 
doi:10.12783/dtcse/cscbd2019/30026 fatcat:xtctwaugzbemfbegaugi5q5yny

Schema-guided wrapper maintenance for web-data extraction

Xiaofeng Meng, Dongdong Hu, Chen Li
2003 Proceedings of the fifth ACM international workshop on Web information and data management - WIDM '03  
Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interests.  ...  Our intensive experiments on real Web sites show that the proposed approach can effectively maintain wrappers to extract desired data with high accuracies.  ...  [19] propose an approach to automatic data extraction by automatically inducing the underlying template of some sample pages with the same structures from data-intensive Web sites.  ... 
doi:10.1145/956700.956701 fatcat:whoyakmngfefhcs6lpqohyh5tq

Schema-guided wrapper maintenance for web-data extraction

Xiaofeng Meng, Dongdong Hu, Chen Li
2003 Proceedings of the fifth ACM international workshop on Web information and data management - WIDM '03  
Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interests.  ...  Our intensive experiments on real Web sites show that the proposed approach can effectively maintain wrappers to extract desired data with high accuracies.  ...  [19] propose an approach to automatic data extraction by automatically inducing the underlying template of some sample pages with the same structures from data-intensive Web sites.  ... 
doi:10.1145/956699.956701 dblp:conf/widm/MengHL03 fatcat:maqjddsdebholitgj5lsq6wdyq

CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web [article]

Colin Lockard, Xin Luna Dong, Arash Einolghozati, Prashant Shiralkar
2018 arXiv   pre-print
Our method can compete with annotation-based techniques in the literature in terms of extraction quality.  ...  Existing methods for extracting relations from the DOM trees of semi-structured webpages can achieve high precision and recall only when manual annotations for each website are available.  ...  automatic knowledge extraction from the Web [10, 9, 15, 14, 6] .  ... 
arXiv:1804.04635v1 fatcat:7g34nyfxvzea5e5zn3j7ejknjm
« Previous Showing results 1 — 15 out of 38,009 results