Using Negation and Phrases in Inducing Rules for Text Classification [chapter]

Stephanie Chua, Frans Coenen, Grant Malcolm, Matías Fernando, García Constantino
2011 Research and Development in Intelligent Systems XXVIII  
An investigation into the use of negation in Inductive Rule Learning (IRL) for text classification is described. The use of negated features in the IRL process has been shown to improve effectiveness of classification. However, although in the case of small datasets it is perfectly feasible to include the potential negation of all possible features as part of the feature space, this is not possible for datasets that include large numbers of features such as those used in text mining
more » ... . Instead a process whereby features to be negated can be identified dynamically is required. Such a process is described in the paper and compared with established techniques (JRip, NaiveBayes, Sequential Minimal Optimization (SMO), OlexGreedy). The work is also directed at an approach to text classification based on a "bag of phrases" representation; the motivation here being that a phrase contains semantic information that is not present in single keyword. In addition, a given text corpus typically contains many more key-phrase features than keyword features, therefore, providing more potential features to be negated.
doi:10.1007/978-1-4471-2318-7_11 dblp:conf/sgai/ChuaCMG11 fatcat:uziw3nqwkjcaxpxwzzkhgferj4