Aspect and Entity Extraction for Opinion Mining [chapter]

Lei Zhang, Bing Liu
2014 Studies in Big Data  
Opinion mining or sentiment analysis is the computational study of people's opinions, appraisals, attitudes, and emotions toward entities such as products, services, organizations, individuals, events, and their different aspects. It has been an active research area in natural language processing and Web mining in recent years. Researchers have studied opinion mining at the document, sentence and aspect levels. Aspect-level (called aspect-based opinion mining) is often desired in practical
more » ... cations as it provides the detailed opinions or sentiments about different aspects of entities and entities themselves, which are usually required for action. Aspect extraction and entity extraction are thus two core tasks of aspect-based opinion mining. In this chapter, we provide a broad overview of the tasks and the current state-of-the-art extraction techniques. Definition (entity): An entity e is a product, service, person, event, organization, or topic. It is associated with a pair, e: (T, W), where T is a hierarchy of components (or parts), sub-components, and so on, and W is a set of attributes of e. Each component or sub-component also has its own set of attributes. Example: A particular brand of cellular phone is an entity, e.g., iPhone. It has a set of components, e.g., battery and screen, and also a set of attrib-4 utes, e.g., voice quality, size, and weight. The battery component also has its own set of attributes, e.g., battery life, and battery size. Based on this definition, an entity can be represented as a tree or hierarchy. The root of the tree is the name of the entity. Each non-root node is a component or sub-component of the entity. Each link is a part-of relation. Each node is associated with a set of attributes. An opinion can be expressed on any node and any attribute of the node. Example: One can express an opinion about the iPhone itself (the root node), e.g., "I do not like iPhone", or on any one of its attributes, e.g., "The voice quality of iPhone is lousy". Likewise, one can also express an opinion on any one of the iPhone's components or any attribute of the component. In practice, it is often useful to simplify this definition due to two reasons: First, natural language processing is difficult. To effectively study the text at an arbitrary level of detail as described in the definition is very hard. Second, for an ordinary user, it is too complex to use a hierarchical representation. Thus, we simplify and flatten the tree to two levels and use the term aspects to denote both components and attributes. In the simplified tree, the root level node is still the entity itself, while the second level nodes are the different aspects of the entity. Definition (aspect and aspect expression): The aspects of an entity e are the components and attributes of e. An aspect expression is an actual word or phrase that has appeared in text indicating an aspect. Example: In the cellular phone domain, an aspect could be named voice quality. There are many expressions that can indicate the aspect, e.g., "sound," "voice," and "voice quality." Aspect expressions are usually nouns and noun phrases, but can also be verbs, verb phrases, adjectives, and adverbs. We call aspect expressions in a sentence that are nouns and noun phrases explicit aspect expressions. For example, "sound" in "The sound of this phone is clear" is an explicit aspect expression. We call aspect expressions of the other types, implicit aspect expressions, as they often imply some aspects. For example, "large" is an implicit aspect expression in "This phone is too large". It implies the aspect size. Many implicit aspect expressions are adjectives and adverbs, which imply some specific aspects, e.g., expensive (price), and reliably (reliability). Implicit aspect expressions are not just adjectives and adverbs. They can be quite complex, for example, "This phone will not easily fit in pockets". Here, "fit in pockets" indicates the aspect size (and/or shape).
doi:10.1007/978-3-642-40837-3_1 fatcat:wp3wgs32nzdlrjc7d4gm7wskja