Exploring the Power of Supervised Learning Methods for Company Name Disambiguation in Microblog Posts
Turkish Journal of Electrical Engineering and Computer Sciences
Twitter is an online social networking website where people can post short messages on any subject, and these messages become visible to other users. Users intentionally express their opinions about companies or products via micro-blogging texts. Analyzing such messages might help explore what customers think about company products, or what the broad feeling of customers is. Identifying tweets referring to products and companies is becoming an important tool recently. However, company names are
... often vague. Hence, the first step is to locate the messages that are relevant to a company. In this paper, we present a number of supervised learning techniques to decide whether a given tweet post is about a company, e.g., whether a message containing the term 'amazon', is related to the company, Amazon Inc., or not. Solving this task is challenging in comparison to the classical classification process. The main difficulty with this problem is that tweet messages and company names include limited information. To make this task tractable, external resources are used to get richer data about a company. More specifically, we generate several profiles for each organization, which contain richer information. Then, we perform feature extraction to obtain both numerical and categorical features and we do feature selection to identify the most relevant attributes with our task. Finally, we train several supervised classifiers. Our constructed classifiers and carefully selected features provide high accuracy on WePS-3 dataset. Our results show considerable improvement of accuracy by 11% over baseline approaches.