Data Link Discovery Tools for Big Linked Data: A comprehensive study
International Conference on Big Data and Cyber-Security Intelligence
Data discovery, linking and integration techniques are of great importance for big data variety challenge. Linked Open Data (LOD) and Semantic Web technologies have worked as a driver to address this challenge. However, until 2015, the linkage of triples of LOD has increased to 40%, of which only 3% of overall triples are links between different datasets. Today, with the increasing amount of available LOD datasets, 9671 datasets compose the LOD, the need to link them together is becoming vital.
... Links are usually generated, or discovered, by specific frameworks such as SILK and LIMES, which are two of the most effective tools in this domain. They apply instance matching rather than ontology matching, and support active learning. They both have their drawbacks and their advantages, which makes it hard to disregard one of them. This paper aims to evaluate whether SILK and LIMES are potential options for interlinking large-scale biomedical datasets, comparing the two frameworks at many levels, starting from the general features, reaching the comparison measures, the resulting files, the performance and the effectiveness of the links produced. The conclusions drawn from this work are to be used as a reference for the evaluation of the core differences between SILK and LIMES and therefore for choosing the most suitable tool in a Biomedical context. It can be considered as an opening for future research and enhancements of such frameworks.