Bridging Concept Identification for Constructing Information Networks from Text Documents [chapter]

Matjaž Juršič, Borut Sluban, Bojan Cestnik, Miha Grčar, Nada Lavrač
2012 Lecture Notes in Computer Science  
A major challenge for next generation data mining systems is creative knowledge discovery from diverse and distributed data sources. In this task an important challenge is information fusion of diverse mainly unstructured representations into a unique knowledge format. This chapter focuses on merging information available in text documents into an information network -a graph representation of knowledge. The problem addressed is how to efficiently and effectively produce an information network
more » ... rom large text corpora from at least two diverse, seemingly unrelated, domains. The goal is to produce a network that has the highest potential for providing yet unexplored cross-domain links which could lead to new scientific discoveries. The focus of this work is better identification of important domain-bridging concepts that are promoted as core nodes around which the rest of the network is formed. The evaluation is performed by repeating a discovery made on medical articles in the migraine-magnesium domain.
doi:10.1007/978-3-642-31830-6_6 fatcat:cefvnsx4lnhcxaclrfav7bgkue