Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '12
A critical step in bridging the knowledge base with the huge corpus of semi-structured Web list data is to link the entity mentions that appear in the Web lists with the corresponding real world entities in the knowledge base, which we call list linking task. This task can facilitate many different tasks such as knowledge base population, entity search and table annotation. However, the list linking task is challenging because a Web list has almost no textual context, and the only input for
... only input for this task is a list of entity mentions extracted from the Web pages. In this paper, we propose LIEGE, the first general framework to Link the entIties in wEb lists with the knowledGe basE to the best of our knowledge. Our assumption is that entities mentioned in a Web list can be any collection of entities that have the same conceptual type that people have in mind. To annotate the list items in a Web list with entities that they likely mention, we leverage the prior probability of an entity being mentioned and the global coherence between the types of entities in the Web list. The interdependence between different entity assignments in a Web list makes the optimization of this list linking problem NP-hard. Accordingly, we propose a practical solution based on the iterative substitution to jointly optimize the identification of the mapping entities for the Web list items. We extensively evaluated the performance of our proposed framework over both manually annotated real Web lists extracted from the Web pages and two public data sets, and the experimental results show that our framework significantly outperforms the baseline method in terms of accuracy.