Learning agents for uncertain environments (extended abstract)

Stuart Russell
1998 Proceedings of the eleventh annual conference on Computational learning theory - COLT' 98  
This talk proposes a very simple "baseline architecture" for a learning agent that can handle stochastic, partially observable environments. The architecture uses reinforcement learning together with a method for representing temporal processes as graphical models. I will discuss methods for learning the parameters and structure of such representations from sensory inputs, and for computing posterior probabilities. Some open problems remain before we can try out the complete agent; more arise
more » ... en we consider scaling up. A second theme of the talk will be whether reinforcement learning can provide a good model of animal and human learning. To answer this question, we must do inverse reinforcement learning: given the observed behaviour, what reward signal, if any, is being optimized? This seems to be a very interesting problem for the COLT, UAI, and ML communities, and has been addressed in econometrics under the heading of structural estimation of Markov decision processes.
doi:10.1145/279943.279964 dblp:conf/colt/Russell98 fatcat:bnv35r7dzfdzfdhch3qea6amdy