The stability of long action chains in XCS

A. M. Barry
2002 Soft Computing - A Fusion of Foundations, Methodologies and Applications  
Alwyn.Barry@uwe.ac.uk (++44) [0]117 344 3135 XCS [1][2] represents a new form of Learning Classifier System [3] that uses accuracy as a means of guiding fitness for selection within a Genetic Algorithm. The combination of accuracy-based selection and a dynamic niche-based deletion mechanism achieve a long sought-after goal -the reliable production, maintenance, and proliferation of the sub-population of optimally general accurate classifiers that map the problem domain [4]. Wilson [2] and Lanzi
more » ... [5][6] have demonstrated the applicability of XCS to the identification of the optimal action-chain leading to the optimum trade-off between reward distance and magnitude. However, Lanzi [6] also demonstrated that XCS has difficulty in finding an optimal solution to the long action-chain environment Woods-14 [7]. Whilst these findings have shed some light on the ability of XCS to form long action-chains, they have not provided a systematic and, above all, controlled investigation of the limits of XCS learning within multiple-step environments. In this investigation a set of confounding variables in such problems are identified. These are controlled using carefully constructed FSW environments [8][9] of increasing length. Whilst investigations demonstrate that XCS is able to establish the optimal sub-population [O] [4] when generalisation is not used, it is shown that the introduction of generalisation introduces low bounds on the length of action-chains that can be identified and chosen between to find the optimal pathway. Where these bounds are reached a form of over-generalisation caused by the formation of dominant classifiers can occur. This form is further investigated and the Domination Hypothesis introduced to explain its formation and preservation.
doi:10.1007/s005000100115 fatcat:akplsycohzbehkuoq45jhu4hiq