An Environment Model for Nonstationary Reinforcement Learning

Samuel P. M. Choi, Dit-Yan Yeung, Nevin Lianwen Zhang
1999 Neural Information Processing Systems  
Reinforcement learning in nonstationary environments is generally regarded as an important and yet difficult problem. This paper partially addresses the problem by formalizing a subclass of nonstationary environments. The environment model, called hidden-mode Markov decision process (HM-MDP), assumes that environmental changes are always confined to a small number of hidden modes. A mode basically indexes a Markov decision process (MDP) and evolves with time according to a Markov chain. While
more » ... -MDP is a special case of partially observable Markov decision processes (POMDP), modeling an HM-MDP environment via the more general POMDP model unnecessarily increases the problem complexity. A variant of the Baum-Welch algorithm is developed for model learning requiring less data and time.
dblp:conf/nips/ChoiYZ99 fatcat:pzoyhedkbrepdmuwfn2cbckaki