A Novel Large-scale Chinese Encyclopedia Knowledge Parallel Refining Method Based on MapReduce

Ting Wang, Jie Li, Jiale Guo, Jingyao Xie
2019 IEEE Access  
The open collaborative characteristics of online encyclopedia and the large number of ambiguity phenomena in the encyclopedia entry lead to inappropriate classification of plenty of Infobox knowledge triples of entries, which requires for refining and denoising of large-scale knowledge to improve the precision of Knowledge Base (KB). The enormous amount of triples in the KBs will cause excessive serial computing time expenditure by knowledge denoising and disambiguation processing. Existing
more » ... ledge refinement and disambiguation techniques have limitations in terms of scalability and time-efficient. There is still few typical research on the parallel processing of knowledge refinement in distributed environment. Therefore, this paper proposes a novel parallel algorithm for Chinese large-scale knowledge refinement based on MapReduce to further improve the overall system computing speed through parallel optimization for serial algorithm. Based on the original serial refining algorithm which can enhance the precision of encyclopedia-oriented KBs, results show that the novel parallel denoising algorithm proposed in this paper can further provide the system with good scalability and high speedup. INDEX TERMS Knowledge base, knowledge refinement, similarity computing, parallel computing, MapReduce. 111840 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ VOLUME 7, 2019
doi:10.1109/access.2019.2934747 fatcat:trx4gu2elfga3g6y2uvoiukvoa