Risk-sensitive optimal control of hidden Markov models: structural results

E. Fernandez-Gaucherand, S.I. Marcus
1997 IEEE Transactions on Automatic Control  
The authors consider a risk-sensitive optimal control problem for (finite state and action spaces) hidden Markov models (HMM). They present results of an investigation on the nature and structure of risk-sensitive controllers for HMM. Several general structural results are presented, as well as a particular case study of a popular benchmark problem. For the latter, they obtain structural results for the optimal risksensitive controller and compare it to that of the risk-neutral controller.
more » ... ermore, they show that indeed the risk-sensitive controller and its corresponding information state converge to the known solutions for the risk-neutral situation as the risk factor goes to zero. They also study the infinite and general risk aversion cases. Index Terms-Hidden Markov models, information states, optimal stochastic control, risk-sensitive optimality criterion. I. INTRODUCTION We consider a risk-sensitive optimal control problem for hidden Markov models (HMM), i.e., controlled Markov chains where state information is only available to the decision-maker (DM) or controller via an output (message) process. The optimal control of HMM under standard, risk-neutral performance criteria, e.g., discounted and average costs, has received much attention in the past. Many basic results and numerous applications have been reported in the literature in this subject; see [1], [2], [14], and references therein. Controlled Markov chains with full state information and a risksensitive performance criterion have also received some attention [4], [6], [12]. On the other hand, quite the opposite is the situation for HMM under risk-sensitive criteria, e.g., expected value of the exponential of additive costs. Whittle and others (see [19] , [20], and references therein) have extensively studied the risk-sensitive optimal control of partially-observable linear exponential quadratic Gaussian (LEQG) systems; see also [5] . More recently, James et al. [13] , [3] have treated the risk-sensitive partially observable optimal control problem of discrete-time nonlinear systems. The paucity of results in this subject area can be mostly attributed to the lack in the past of appropriate sufficient statistics or information states. As is well known, if the cost criterion being considered is of the type "expected value of additive costs," then the posterior probability density, given all available information up to the present, constitutes a sufficient statistic for control (or information state); see Manuscript received March 4, 1996. E. Fernández-Gaucherand was sup-. Publisher Item Identifier S 0018-9286(97)07627-7. Proof: From Lemma 3.4 we see that J ( 0 ; M 0 k) = J (; M 0 k): Hence, the result follows from Lemma 3.2. Definition 3.1: From (12), for u 2 U U U and k = 1;2;1 11; M; let ; and in which x(t) is the state vector, and u(t) and y(t) represent the input and output,
doi:10.1109/9.633830 fatcat:gy5iikqysvesvlqladwjpm4yfy