Coping with Missing and Incomplete Information in Natural Language Processing with Applications in Sentiment Analysis and Entity Matching [article]

(:Unkn) Unknown, University, My, Eduard C. Dragut
Much work in Natural Language Processing (NLP) is broadly concerned with extracting useful information from unstructured text passages. In recent years there has been an increased focus on informal writing as is found in online venues such as Twitter and Yelp. Processing this text introduces additional difficulties for NLP techniques, for example, many of the terms may be unknown due to rapidly changing vocabulary usage. A straightforward NLP approach will not have any capability of using the
more » ... formation these terms provide. In such \emph{information poor} environments of missing and incomplete information, it is necessary to develop novel, clever methods for leveraging the information we have explicitly available to unlock key nuggets of implicitly available information. In this work we explore several such methods and how they can collectively help to improve NLP techniques in general, with a focus on Sentiment Analysis (SA) and Entity Matching (EM). The problem of SA is that of identifying the polarity (positive, negative, neutral) of a speaker or author towards the topic of a given piece of text. SA can focus on various levels of granularity. These include finding the overall sentiment of a long text document, finding the sentiment of individual sentences or phrases, or finding the sentiment directed toward specific entities and their aspects (attributes). The problem of EM, also known as Record Linkage, is the problem of determining records from independent and uncooperative data sources that refer to the same real-world entities. Traditional approaches to EM have used the record representation of entities to accomplish this task. With the nascence of social media, entities on the Web are now accompanied by user generated content, which allows us to apply NLP solutions to the problem. We investigate specifically the following aspects of NLP for missing and incomplete information: (1) Inferring a sentiment polarity (i.e., the positive, negative, and neutral composition) of new terms. (2) Inferring a represent [...]
doi:10.34944/dspace/3517 fatcat:3mufwxxozvctpobwi3o2bse7za