A hybrid architecture for function approximation

Hassab Elgawi
2008 2008 6th IEEE International Conference on Industrial Informatics  
This paper proposes a new approach to build a value function estimation based on a combination of temporaldifferent (TD) and on-line variant of Random Forest (RF). We call this implementation Random-TD. First RF is induced into on-line mode in order to deal with large state space and memory constraints, while state-action mapping is based on the Bellman error, or on the TD error. We evaluate the potential of the proposed procedure in terms of a reduction in the Bellman error with extended
more » ... cal studies on high-dimensional control problems (Ailerons, Elevator, Kinematics, and Friedman), a standard reinforcement learning benchmark on which several linear function approximators have previously performed poorly. The results demonstrate that a hybrid function approximation (Random-TD) can significantly improve the performance of TD methods.
doi:10.1109/indin.2008.4618267 fatcat:mevmnn5ktngkzo3m6c3riyx6qu