Tweet Segmentation and Named Entity Recognition

Mr Chavan, Suryawanshi
2016 unpublished
Twitter has involved lots of users to share and distribute most recent information, resulting in a large sizes of data produced every day. However, a variety of application in Natural Language Processing and Information Retrieval (IR) suffer harshly from the noisy and short character of tweets. Here, we suggest a framework for tweet segmentation in a batch mode, called HybridSeg. By dividing tweets into meaningful segments, the semantic or background information is well preserved and without
more » ... ficulty retrieve by the downstream application. HybridSeg finds the best segmentation of a tweet by maximizing the addition of the adhesiveness scores of its applicant segments. The stickiness score considering the probability of a segment being a express in English (i.e, global context and local context). latter, we propose and evaluate two models to derive with local context by involving the linguistic structures and term-dependency in a batch of tweets, respectively. Experiments on two tweet data sets illustrate that tweet segmentation value is significantly increased by learning both global and local contexts compared by global context only. Through analysis and assessment, we show that local linguistic structures are extra reliable for understanding local context compare with term-dependency.