Text classification using ESC-based stochastic decision lists

Hang Li, Kenji Yamanishi
2002 Information Processing & Management  
7 We propose a new method of text classi®cation using stochastic decision lists. A stochastic decision list is 8 an ordered sequence of IF-THEN rules, and our method can be viewed as a rule-based method for text 9 classi®cation having advantages of readability and re®nability of acquired knowledge. Our method is 10 unique in that decision lists are automatically constructed on the basis of the principle of minimizing ex-11 tended stochastic complexity (ESC), and with it we are able to construct
more » ... decision lists that have fewer errors 12 in classi®cation. The accuracy of classi®cation achieved with our method appears better than or compa-13 rable to those of existing rule-based methods. We have empirically demonstrated that rule-based methods 14 like ours result in high classi®cation accuracy when the categories to which texts are to be assigned are 15 relatively speci®c ones and when the texts tend to be short. We have also empirically veri®ed the advantages 16 of rule-based methods over non-rule-based ones. Ó In this paper, we propose a new rule-based method for text classi®cation that represents rules in 28 the form of stochastic decision lists. A stochastic decision list is an ordered sequence of IF-THEN 29 rules, each containing a condition, a classi®cation decision, and a probability value. When used in 30 text classi®cation, a condition generally refers to the presence or absence of certain words in a 31 text, a classi®cation decision denotes a category to which a text is to be assigned, and a probability 32 value refers to the likelihood of such an assignment. 33 We should note that the performance of a stochastic decision list (or a set of rules) heavily 34 depends on the criterion (or principle) for model selection in learning. Our method is unique in 35 that decision lists are constructed on the basis of the principle of minimizing a quantity called 36 extended stochastic complexity (ESC) (Yamanishi, 1998) . A decision list is basically constructed 37 in two steps: growing and pruning. In growing, it sequentially adds, on the basis of the principle of 38 minimizing ESC, new rules to the decision list to be constructed. In pruning, it recursively deletes 39 rules, starting from the last rule of the list, until reaching a rule for which it appears, on the basis 40 of the principle of minimizing ESC, the pruning should be stopped. 41 Our method has certain advantages which are not shared by other rule-based methods (we will 42 later discuss the relationship between our method and existing methods like Ripper). First, the 43 learning algorithm is much simpler than those employed in other rule-based methods. Second, it is 44 theoretically guaranteed that our method will achieve high classi®cation accuracy. Actually, the 45 principle of minimizing ESC has the lowest expected classi®cation error for any model selection 46 criterion yet proposed (cf. Yamanishi, 1998). Experimental results indicate that our method is 47 quite eective. It achieves 82.0% classi®cation accuracy in terms of break-even point for Reuters-48 21 578 data, which is better than or comparable to those of existing rule-based methods.
doi:10.1016/s0306-4573(01)00038-3 fatcat:un7qlgaz4zdwvaunb3ryyhx5rm