A composite kernel for named entity recognition

Sujan Kumar Saha, Shashi Narayan, Sudeshna Sarkar, Pabitra Mitra
2010 Pattern Recognition Letters  
In this paper, we propose a novel kernel function for support vector machines (SVM) that can be used for sequential labeling tasks like named entity recognition (NER). Machine learning methods like support vector machines, maximum entropy, hidden Markov model and conditional random fields are the most widely used methods for implementing NER systems. The features used in machine learning algorithms for NER are mostly string based features. The proposed kernel is based on calculating a novel
more » ... ance function between the string based features. In tasks like NER, the similarity between the contexts as well as the semantic similarity between the words play an important role. The goal is to capture the context and semantic information in NER like tasks. The proposed distance function makes use of certain statistics primarily derived from the training data and hierarchical clustering information. The kernel function is applied to the Hindi and biomedical NER tasks and the results are quite promising.
doi:10.1016/j.patrec.2010.05.004 fatcat:lcct5thwkvgx5polrka4jfpvii