344 Hits in 3.4 sec

Site-Wide Wrapper Induction for Life Science Deep Web Databases [chapter]

Saqib Mir, Steffen Staab, Isabel Rojas
2009 Lecture Notes in Computer Science  
We present a novel approach to automatic information extraction from Deep Web Life Science databases using wrapper induction.  ...  Our solution to this novel challenge of Site-Wide wrapper induction consists of a sequence of steps: 1. classification of similar Web pages into classes, 2. discovery of these classes and 3. wrapper induction  ...  Site-Wide Wrapper Induction As we noted in section 1, data-intensive sites, such as those in the Life Sciences domain, have their data scattered across multiple pages.  ... 
doi:10.1007/978-3-642-02879-3_9 fatcat:py26tz32pndprgiwsrtbvqvlhi

An Unsupervised Approach for Acquiring Ontologies and RDF Data from Online Life Science Databases [chapter]

Saqib Mir, Steffen Staab, Isabel Rojas
2010 Lecture Notes in Computer Science  
from complete Life Science Web sites.  ...  We propose an unsupervised method, based on transformation rules, for performing these two key tasks, which makes use of our previous work on unsupervised wrapper induction for extracting labelled data  ...  Site-Wide Wrapper Induction Data in Life Science Web sites are often scattered across many pages belonging to many different classes.  ... 
doi:10.1007/978-3-642-13489-0_22 fatcat:fkrqxefjlrfjregjqgkav7v42u

Finite-State Approaches to Web Information Extraction [chapter]

Nicholas Kushmerick
2003 Lecture Notes in Computer Science  
Wrapper induction Kushmerick first formalized adaptive Web information extraction with his work on wrapper induction [12, 8, 10] . Kushmerick identified a family of six wrapper classes,  ...  I thank Bernd Thomas for helpful discussions. This research was supported by grant N00014-00-1-0021 from the US Office of Naval Research, and grant SFI/01/F.1/C015 from Science Foundation Ireland.  ...  We survey several prominent examples, as well as some additional research that relates to the entire wrapper "life-cycle" beyond the core learning task: -Section 2 introduces wrapper induction, an approach  ... 
doi:10.1007/978-3-540-45092-4_4 fatcat:qtm6pn4osncmte7w7cjo3nilsi

Web data extraction, applications and techniques: A survey

Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, Robert Baumgartner
2014 Knowledge-Based Systems  
At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering.  ...  At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users  ...  allows to extract and store the data from a Web site as RDF.  ... 
doi:10.1016/j.knosys.2014.07.007 fatcat:cb6zazpx7nfgxkmkiuoxqx5zyq

Adaptive Information Extraction: Core Technologies for Information Agents [chapter]

Nicholas Kushmerick, Bernd Thomas
2003 Lecture Notes in Computer Science  
Before proceeding, we observe that neither XML nor the Semantic Web initiative will eliminate the need for automatic information extraction.  ...  This paper gives a state of the art overview about machine learning approaches for information extraction from documents based on finite state techniques and relational learning methods related to inductive  ...  Wrapper induction Kushmerick first formalized adaptive Web information extraction with his work on wrapper induction [Kushmerick et al., 1997; Kushmerick, 2000a ].  ... 
doi:10.1007/3-540-36561-3_4 fatcat:peutiprqsnd2re3nuycsvwquxu

Extracting Textual Information from Google Using Wrapper Class

A. Muthusamy
2017 Advances in Networks  
A wrapper class is proposed to extract the relevant text information and focus on finding useful facts of knowledge from unstructured web documents using Google.  ...  With the rapid development of Internet, amount of data available on the web regularly increased, which makes it difficult for humans to distinguish relevant information.  ...  web with the help of Google and save it as text document.It is possible to infer such wrappers by induction.  ... 
doi:10.11648/ fatcat:f4dca22qvfdjliaevdbmq572se

Discovering interesting information with advances in web technology

Richi Nayak, Pierre Senellart, Fabian M. Suchanek, Aparna S. Varde
2013 SIGKDD Explorations  
In this article, we shed light on some interesting phenomena of the Web: the deep Web, which surfaces database records as Web pages; the Semantic Web, which defines meaningful data exchange formats; XML  ...  We detail these four developments in Web technology, and explain how they can be used for data mining.  ...  Labeled unsupervised wrapper induction is even harder.  ... 
doi:10.1145/2481244.2481255 fatcat:lvr2d5k3cre6lpnwnd2udp22pe

An Algebraic Language for Semantic Data Integration on the Hidden Web

Shazzad Hosain, Hasan Jamil
2009 2009 IEEE International Conference on Semantic Computing  
In this paper, we present an algebraic language, called Integra, as a foundation for another SQLlike query language called BioFlow, for the integration of Life Sciences data on the hidden Web.  ...  These assumptions allow us to extend the traditional relational algebra to include integration primitives such as schema matching, wrappers, form submission, and object identification as a family of database  ...  Such functions are known as wrapper induction [13] tools.  ... 
doi:10.1109/icsc.2009.94 dblp:conf/semco/HosainJ09 fatcat:tomfk67gafbkvmgdfahsqucx4a

Extending traditional query-based integration approaches for functional characterization of post-genomic data

B. A. Eckman, A. S. Kosky, L. A. Laroco
2001 Bioinformatics  
, flat file, web site, results of runtime analysis).  ...  Wide-ranging multi-source queries often return unmanageably large result sets, requiring non-traditional approaches to exclude extraneous data.  ...  Special thanks to Jim Fickett for his unwavering support and faith in the project.  ... 
doi:10.1093/bioinformatics/17.7.587 pmid:11448877 fatcat:cyhgfx7juzf6hk5hpyh2ulnavi

Spam, Opinions, and Other Relationships: Towards a Comprehensive View of the Web Knowledge Discovery [chapter]

Bettina Berendt
2011 Advanced Topics in Information Retrieval  
An understanding of this fast-moving field is therefore a key component of digital information literacy for everyone and a useful and fascinating extension of knowledge and skills for Information Retrieval  ...  Web mining" or "Web Knowledge Discovery" is the analysis of Web resources with data-mining techniques such as classification, clustering, association-rule or graph-structure methods.  ...  Acknowledgements I thank my students and colleagues from various Web Mining classes for many valuable discussions and ideas. In particular, I thank the members of the  ... 
doi:10.1007/978-3-642-20946-8_3 fatcat:dzzvsoiizbb3terfovfzojtqbi

Personalized information delivery

Peter W. Foltz, Susan T. Dumais
1992 Posters and short talks of the 1992 SIGCHI conference on Human factors in computing systems - CHI '92  
While the second paper addresses the producer of a subscription system by reviewing web site scraping technologies and proposes a new iterative mechanism called XWeb, the third article in this part gives  ...  Either the end user or an appropriate application on his/her side is responsible for filtering and further processing.  ...  As there is no easy to learn and widely used query tool for HTML like there is SQL for databases, tapping the Web needs some work. In principle it is possible to write wrappers for web pages by hand.  ... 
doi:10.1145/1125021.1125024 dblp:conf/chi/FoltzD92 fatcat:ddxdxdx35zazhcwut6tgnb53ru

An approach for pipelining nested collections in scientific workflows

Timothy M. McPhillips, Shawn Bowers
2005 SIGMOD record  
This work was supported by the National Science Foundation GriPhyN Project, grant ITR-800864, the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific  ...  Moreover, he thanks Gianluigi Greco for his recent contribution to the weighted extension, and Alfredo Mazzitelli for his valuable work in designing and implementing the tools for experiments.  ...  A critical issue for the development of wrappers is that legacy data systems vary widely in their support for data manipulation and description.  ... 
doi:10.1145/1084805.1084809 fatcat:sgtpcat7vzc3veb4dx2jgskpte

GAWA – A Feature Selection Method for Hybrid Sentiment Classification

A. Rasool, R. Tao, M. Kamyab, H. Shoaib
2020 IEEE Access  
The Wrapper feature selection approach has been widely used in numerous applications, e.g., in the medical field for the calculation of optimum features from coronary artery disease [43] .  ...  Sentiment or opinion classification has an immense impact on multiple fields of life.  ... 
doi:10.1109/access.2020.3030642 fatcat:f5if4b4c35dx7f7a5lm4uuffwy

From information to knowledge

Gerhard Weikum, Martin Theobald
2010 Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems of data - PODS '10  
This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web  ...  The latter is also known as wrapper induction.  ...  Rule/Query-based Methods Wrappers and wrapper induction. From a DB perspective, the obvious idea is to exploit regularities in the structure of Web sources.  ... 
doi:10.1145/1807085.1807097 dblp:conf/pods/WeikumT10 fatcat:vtgbi6sjafgsrhmnlztf6q5mxu

Foundational Challenges in Automated Semantic Web Data and Ontology Cleaning

J.A. Alonso-Jimenez, J. Borrego-Diaz, A.M. Chavez-Gonzalez, F.J. Martin-Mateos
2006 IEEE Intelligent Systems  
We can build trust in Semantic Web logic only if it's based on certified reasoning.  ...  Applying automated reasoning systems to Semantic Web data cleaning and to cleaning-agent design raises many challenges.  ...  Acknowledgments This work is partially supported by the Ministry of Education and Science project TIN2004-03884, which is cofinanced by FEDER funds (European Union funds for regional development).  ... 
doi:10.1109/mis.2006.7 fatcat:y5x567uyl5ak3cflh27e35epay
« Previous Showing results 1 — 15 out of 344 results