A Logic-Based Tool for Semantic Information Extraction [chapter]

Massimo Ruffolo, Marco Manna, Lorenzo Gallucci, Nicola Leone, Domenico Saccà
2006 Lecture Notes in Computer Science  
The paper describes HıLεX, a new ASP-based system for the extraction of information from unstructured documents. Unlike previous systems, which are mainly syntactic, HıLεX combines both semantic and syntactic knowledge for a powerful information extraction. In particular, the exploitation of background knowledge, stored in a domain ontology, allows to empower significantly the information extraction mechanisms. HıLεX is founded on a new two-dimensional representation of documents, and heavily
more » ... ploits DLP + -an extension of disjunctive logic programming for ontology representation and reasoning which has been recently implemented on top of DLV . The domain ontology is represented in DLP + , and the extraction patterns are encoded by DLP + reasoning modules, whose execution yields the actual extraction of information from the input document. HıLεX allows to extract information from both HTML and flat text documents.
doi:10.1007/11853886_48 fatcat:gap2yz66qzapjjujq2hz4t4ivm