Comparing the Performance of Different NLP Toolkits in Formal and Social Media Text

Alexandre Pinto, Hugo Gonçalo Oliveira, Ana Oliveira Alves, Marc Herbstritt
2016 Symposium on Languages, Applications and Technologies  
Nowadays, there are many toolkits available for performing common natural language processing tasks, which enable the development of more powerful applications without having to start from scratch. In fact, for English, there is no need to develop tools such as tokenizers, partof-speech (POS) taggers, chunkers or named entity recognizers (NER). The current challenge is to select which one to use, out of the range of available tools. This choice may depend on several aspects, including the kind
more » ... nd source of text, where the level, formal or informal, may influence the performance of such tools. In this paper, we assess a range of natural language processing toolkits with their default configuration, while performing a set of standard tasks (e.g. tokenization, POS tagging, chunking and NER), in popular datasets that cover newspaper and social network text. The obtained results are analyzed and, while we could not decide on a single toolkit, this exercise was very helpful to narrow our choice.
doi:10.4230/oasics.slate.2016.3 dblp:conf/slate/PintoOA16 fatcat:mn2pco57ubc5hop2dsasvz3rai