Hallym: Named Entity Recognition on Twitter with Word Representation

Eun-Suk Yang, Yu-Seop Kim
2015 Proceedings of the Workshop on Noisy User-generated Text  
Twitter is a type of social media that contains diverse user-generated texts. Traditional models are not applicable to tweet data because the text style is not as grammaticalized as that of newswire. In this paper, we construct word embeddings via canonical correlation analysis (CCA) on a considerable amount of tweet data and show the efficacy of word representation. Besides word embedding, we use partof-speech (POS) tags, chunks, and brown clusters induced from Wikipedia as features. Here, we
more » ... escribe our system and present the final results along with their analysis. Our model achieves an F1 score of 37.21% with entity types and distinguishes 53.01% of the entity boundaries.
doi:10.18653/v1/w15-4310 dblp:conf/aclnut/YangK15 fatcat:p4esrps7pbhz7ecjqyvio5vcvi