Parallel Markov-based Clustering Strategy for Large-scale Ontology Partitioning
Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
Actually, huge amounts of data are generated at distributed heterogeneous sources, to create and to share information on several domains. Thus, data scientists need to develop appropriate and efficient management strategies to cope with the heterogeneity and the interoperability issues of data sources. In fact, ontology as schema-less graph model and ontology matching as dynamic real-time large-scale data integration enabler are addressed to design and develop advanced management mechanisms.
... ever, given the large-scale context, we adopt ontology partitioning strategies, which split ontologies into a set of disjoint partitions, as a crucial part to reduce the computational complexity and to improve the performance of the ontology matching process. To this end, this paper proposes a novel approach for large-scale ontology partitioning through parallel Markov-based clustering strategy using Spark framework. This latter offers the ability to run in-memory computations to provide faster and expressive partitioning and to increase the speed of the matching system. The results drawn by our strategy over real-world ontologies demonstrate significant performance which makes it suitable to be incorporated in our large-scale ontology matching system. Mountasser I., Ouhbi B. and Frikh B. Parallel Markov-based Clustering Strategy for Large-scale Ontology Partitioning.