444,103 Hits in 4.9 sec

Interactive Tuples Extraction from Semi-Structured Data

Remi Gilleron, Patrick Marty, Marc Tommasi, Fabien Torre
2006 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06)  
This paper studies from a machine learning viewpoint the problem of extracting tuples of a target n-ary relation from tree structured data like XML or XHTML documents.  ...  It is incremental: partial tuples are extracted by increasing length.  ...  Conclusion We have presented a machine learning approach for n-ary relation extraction from semi-structured documents.  ... 
doi:10.1109/wi.2006.102 dblp:conf/webi/GilleronMTT06 fatcat:wuvba6orarbdjbqzb45q6biri4

Extracting Medical Information Using Linked Data

Jakub Kozák, Martin Necaský, Jaroslav Pokorný
2012 Workshop on Semantic Web Applications and Tools for Life Sciences  
In this contribution we introduce a system which performs information extraction from partially structured texts such as discharge summaries.  ...  We hope this is a right direction for medical information publication and enrichment with other data e.g., from Linked Open Data cloud.  ...  Information Extraction System We have developed a system that allows IE from partially structured texts using the advantages of ontologies and Linked Data.  ... 
dblp:conf/swat4ls/KozakNP12 fatcat:khd4pipzyjapbe3eaiiukbgvvy

WIKE: A Web Information/Knowledge Extraction System for Web Service Generation

Hao Han, Takehiro Tokuda
2008 2008 Eighth International Conference on Web Engineering  
We present WIKE, a system for partial information extraction from Web pages without programming. We also give its applications to Web service generation.  ...  Examples are collection of capital city names and population data from country profile sites or collection of company names and their industrial fields from finance sites.  ...  Extraction Result WIKE uses the extraction pattern to extract the partial information from the country pages. The extraction result is an XML-based document.  ... 
doi:10.1109/icwe.2008.30 dblp:conf/icwe/HanT08a fatcat:zw7oc4bp4nfqbnel5if6vknneq

Novel Web Data Extraction Using Template Extraction and Filtering Non Information

2015 International Journal of Science and Research (IJSR)  
Web data extractors are used to automatically extract the data from web documents.  ...  Web is huge repository of information which contains different types of data in various forms. As we need to extract only the relevant data from web.  ...  As the data cannot be directly extracted into a new structured form. [6]Studies the problem of extracting data from a Webpage that contains several structured data records as well as propose a novel partial  ... 
doi:10.21275/v4i12.nov152454 fatcat:t6vgthd245cnhj3wwzy4sb3qla

Efficient Structure Oriented Storage of XML Documents Using ORDBMS [chapter]

Alexander Kuckelberg, Ralph Krieger
2003 Lecture Notes in Computer Science  
In this paper we will present different storage approaches for XML documents, the document centered, the data and the structure centered approach.  ...  Moreover we will shortly introduce the partial mapping extension, which helps to optimize the generic structure based storage approach for specific documents whose structure is known in advance.  ...  Parts of the document can only be returned after the document has been parsed and the requested data have been extracted.  ... 
doi:10.1007/3-540-36556-7_9 fatcat:j74fltl5m5g7nbcaqr4z5jeqcu

Web data extraction based on structural similarity

Zhao Li, Wee Keong Ng, Aixin Sun
2005 Knowledge and Information Systems  
Document schemata are patterns of structures embedded in documents.  ...  Web data-extraction systems in use today mainly focus on the generation of extraction rules, i.e., wrapper induction.  ...  In our framework, a data instance may be a document, a fragment of a document or a piece of extracted structured data.  ... 
doi:10.1007/s10115-004-0188-z fatcat:vyqafjj67re57bxiciehhxv2cq

Event.Locky: System of Event-Data Extraction from Webpages based on Web Mining

Chenyi Liao, Kei Hiroi, Katsuhiko Kaji, Ken Sakurada, Nobuo Kawaguchi
2017 Journal of Information Processing  
The former is used to convert a semi-structural HTML document into processable structured data. The latter filters out non-event data from extracted data records using machine learning.  ...  To address this issue, we use web mining that extracts event data from webpages.  ...  By matching partial tree structures, data records can be extracted; thus, it is suitable for converting from semi-structure data to structured data.  ... 
doi:10.2197/ipsjjip.25.321 fatcat:k4zew2b5nzekbguuhlioxsjple

Pilot Testing of an Information Extraction (IE) Prototype for Legal Research

Brenda Scholtz, Thashen Padayachy, Oluwande Adewoyin
2020 The African journal of information and communication  
The prototype that was piloted seeks to extract, from legal case documents, relevant and accurate information on cases referred to (CRTs) in the source cases.  ...  Testing of CRT extraction from 50 source cases resulted in only 38% (n = 19) of the extractions providing an accurate number of CRTs.  ...  NER is an IE technique that processes extracted information from unstructured and structured text (Abdelmagid et al., 2015) .  ... 
doi:10.23962/10539/29192 fatcat:7cpprnaphndvrpmm2gtuvlt374

Extraction of Functional Structure Graph from System Design Documents

Eiichi Sunagawa, Shinichi Nagano
2018 Joint International Conference of Semantic Technology  
To address this problem, we aim to establish a technology that can extract, from design documents, a knowledge graph that represents the functional structure of a system and use it in development of other  ...  In this paper, we introduce our knowledge-graph extraction framework and describe an experiment show a partial extraction.  ...  Conclusion In this paper, we introduced a prospective approach for extracting a functional model, which represents the functional structure of a target system, from system design documents.  ... 
dblp:conf/jist/SunagawaN18 fatcat:zr7f244xtnhxtgpsy25kl455kq

Natural Language Processing (NLP) – A Solution for Knowledge Extraction from Patent Unstructured Data

Achille Souili, Denis Cavallucci, François Rousselot
2015 Procedia Engineering  
This paper describes a new approach of automatic extraction of IDM (Inventive Design Method) related knowledge from patent documents.  ...  The purpose of this paper is to investigate on the contribution of NLP techniques to effective knowledge extraction from patent documents.  ...  from unstructured and implicit textual data.  ... 
doi:10.1016/j.proeng.2015.12.457 fatcat:vyvm2f432jebjnwxohhbsjm4ay

DEXTER: An end-to-end system to extract table contents from electronic medical health documents [article]

Nandhinee PR, Harinath Krishnamoorthy, Koushik Srivatsan, Anil Goyal, Sudarsun Santhiappan
2022 arXiv   pre-print
table structures, such as bordered, partially bordered, borderless, or coloured tables.  ...  In this paper, we propose DEXTER, an end to end system to extract information from tables present in medical health documents, such as electronic health records (EHR) and explanation of benefits (EOB).  ...  In this paper, we propose an end-to-end system named as DEXTER (Document Extractor) which automatically extracts the data from tables present in medical documents. Related Work.  ... 
arXiv:2207.06823v2 fatcat:whf5piptofeepg3kurtor5sefm

Abstractive Multi-document Summarization by Partial Tree Extraction, Recombination and Linearization

Litton J. Kurisinkel, Yue Zhang, Vasudeva Varma
2017 International Joint Conference on Natural Language Processing  
Existing work for abstractive multidocument summarization utilise existing phrase structures directly extracted from input documents to generate summary sentences.  ...  We introduce a novel approach for abstractive multidocument summarization through partial dependency tree extraction, recombination and linearization.  ...  Relevant and noise pruned partial tree structures are extracted from the set of dependency trees and different subsets of maximally relevant partial dependency structures are identified.  ... 
dblp:conf/ijcnlp/KurisinkelZV17 fatcat:m63kuinacrhlhpvf7rb2sbe6rq

The Use of Ontologies in Wrapper Induction

Marek Nekvasil
2007 Databases, Texts, Specifications, Objects  
The purpose of this entry is to bring in an extension of ontologies so that they can be utilized in the process of automated information extraction from the web documents.  ...  The limitation of the proposed wrapper induction method is the fact that it relies on the tabular structure of extracted data but the extraction is completely automatic and with proper setting of the attributes  ...  Besides that this method can extract only properties with cardinality 1 (the tabular structure) it is also limited in its tolerance to the irregularities in the structure of the document, on the other  ... 
dblp:conf/dateso/Nekvasil07 fatcat:fftn2exndbfnfgavhsjjz3u7mi

Extracting Business Rule from existing COBOL programs for Redevelopment

Emmanuel Nwabueze Ekwonwune, Egwuonwu Deborah I
2019 International Journal Of Engineering And Computer Science  
describing the data, decision and procedural flow of each program slice.  ...  The aim of this work was to extract out the information required to re- implement the Legacy programs in a new client/server environment. The progress solution is in four step.  ...  The reduced data structure is placed in the Linkage Section to be passed as a parameter from the calling program.  ... 
doi:10.18535/ijecs/v8i05.4317 fatcat:nl7tz343zjfzxgxz7cuc52ncti

Do we mean the same?

Elena Demidova, Irina Oelze, Peter Fankhauser
2009 Proceedings of the First International Workshop on Keyword Search on Structured Data - KEYS '09  
In case a user's informational need is expressed in terms of a document, we need algorithms that map keyword queries automatically extracted from this document to the database content.  ...  Our evaluation is performed using a set of user queries from the AOL query log and a set of queries automatically extracted from Wikipedia articles both executed against the Internet Movie Database (IMDB  ...  Like [2] , we considered a specific case of data integration across structured and unstructured data.  ... 
doi:10.1145/1557670.1557682 dblp:conf/sigmod/DemidovaOF09 fatcat:kbstuhqjoree5monts6uswroz4
« Previous Showing results 1 — 15 out of 444,103 results