Opinion Mining from Web Documents: Extraction and Structurization
Transactions of the Japanese society for artificial intelligence
This dissertation deals with the task of extracting customer opinions from web documents. This task is the key component of opinion mining, which allows Web users to retrieve and summarize people's opinions scattered over Web documents. Our aim is to develop a method for extracting opinions, that represent evaluation of consumer products, in a structured form. In this dissertation, we approaches opinion extraction by addressing the following two unexplored issues: how to define the task of
... on extraction and how to extract the structured opinions. Based on a corpus study, we define an opinion unit consisting of a quadruple, that is, the opinion holder, the subject being evaluated (Subject), the part or the attribute in which it is evaluated (Aspect), and the evaluation that expresses a positive or negative assessment (Evaluation). We use this definition as a basis for our opinion extraction task. For the second issue, we divide this task into two subtasks: (a) extracting relations between subjects/aspects and evaluations, and (b) extracting relations between subjects/aspects and aspects. Firstly, we consider the approach to extract these relations using a list of expressions which possibly describe subjects, aspects or evaluations. We propose a semi-automatic method for collecting aspect/evaluation expressions, which uses particular co-occurrence patterns of subjects, aspects and evaluations. Our semi-automatic method can collect these i expressions much more efficiently than manual collection. Secondly, we discuss a method for extracting aspect-evaluation relations using dictionaries of aspect and evaluation. We point out that finding the aspect of an evaluation is similar to finding the missing antecedent of an ellipsis, and introduce a machine learning-based method used for anaphora resolution to this task. By using anaphora resolution techniques, we achieve nearly 20 point improvement in F-measure compared with a baseline model. Thirdly, we approach the task for extracting aspect-evaluation relations and aspect-aspect relations without relying on an aspect dictionary. We approach two subtasks using methods which combine contextual clues and context-independent statistical clues. We show that the models using the contextual clues show nearly 10 % improvement in both recall and precision, and the contextual clues learned in a domain are effective in other domains, which indicates the portability of our proposed model.