Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery [chapter]

Scott Proper, Prasad Tadepalli
2006 Lecture Notes in Computer Science  
Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity. We present approaches that mitigate each of these curses. To handle the state-space explosion, we introduce "tabular linear functions" that generalize tile-coding and linear value functions. Action space complexity is reduced by replacing complete joint action space search with a form of hill climbing. To deal with high stochasticity, we
more » ... ce a new algorithm called ASH-learning, which is an afterstate version of H-Learning. Our extensions make it practical to apply reinforcement learning to a domain of product delivery -an optimization problem that combines inventory control and vehicle routing.
doi:10.1007/11871842_74 fatcat:u4v7pizu2rabpfht2vgypgm2he