Mining Semantic Structures from Syntactic Structures in Free Text Documents

Hamid Mousavi, Deirdre Kerr, Markus Iseli, Carlo Zaniolo
2014 2014 IEEE International Conference on Semantic Computing  
The Web has made possible many advanced textmining applications, such as news summarization, essay grading, question answering, and semantic search. For many of such applications, statistical text-mining techniques are ineffective since they do not utilize the morphological structure of the text. Thus, many approaches use NLP-based techniques, that parse the text and use patterns to mine and analyze the parse trees which are often unnecessarily complex. Therefore, we propose a weighted-graph
more » ... resentation of text, called TextGraphs, which captures the grammatical and semantic relations between words and terms in the text. TextGraphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves coreferences by a novel technique, generates domain-specific TextGraphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining TextGraphs.
doi:10.1109/icsc.2014.31 dblp:conf/semco/MousaviKIZ14 fatcat:5omm44j6gzg7jbtssxhmeyp4mm