Cybersecurity Named Entity Recognition using Multi-modal Ensemble Learning

Feng Yi, Bo Jiang, Lu Wang, Jianjun Wu
2020 IEEE Access  
Cybersecurity named entity recognition is an important part of threat information extraction from large-scale unstructured text collection in many cybersecurity applications. Most existing security entity recognition studies and systems use regular matching strategy or machine learning algorithms. Due to the peculiarity and complexity of security named entity, these models ignore the characteristic of security data and the correlation of entities. Therefore, through the in-depth study of
more » ... y entity characteristic, we propose a novel security named entity recognition model based on regular expressions and known-entity dictionary as well as conditional random fields (CRF) combined with four feature templates. This model is named RDF-CRF. The rule-based expressions can match security entities with good accuracy in simpler situations, the known-entity dictionary can extract common and specific security entity, and the CRF-based extractor leverages the identified entities by rule-based and dictionary-based extractors to further improve the recognition performance. In order to demonstrate the effectiveness of our proposed model, extensive experiments are performed on a security text dataset collected from public security webs. The experimental results show that can achieve better performance than state-of-the-art methods. INDEX TERMS Cybersecurity, named entity recognition, regular expression, known-entity dictionary, conditional random fields. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see VOLUME 8, 2020
doi:10.1109/access.2020.2984582 fatcat:vi4jkk5p6zfvdii5dmeor7bfue