Discretized Maximum Likelihood and Almost Optimal Adaptive Control of Ergodic Markov Models

T. E. Duncan, B. Pasik-Duncan, L. Stettner
1998 SIAM Journal of Control and Optimization  
Three distinct controlled ergodic Markov models are considered here. The models are a discrete time controlled Markov process with complete observations, a controlled diffusion process with complete observations, and a discrete time controlled Markov process with partial observations. The partial observations for the third model have the special form of complete observations in a fixed recurrent set and noisy observations in its complement. For each of the models an almost self-optimizing
more » ... ve control is given. These adaptive controls are constructed from a family of estimates that use a finite discretization of the parameter set and a finite family of almost optimal ergodic controls by a randomized certainty equivalence method. A continuity property of the information of a model for one parameter value with respect to another is used to establish this almost optimality property. AMS subject classifications. 93E35, 93C40, 60J05, 62M05 PII. S0363012996298369 1. Introduction. In many control problems the models are not completely described and there are perturbations or unmodeled dynamics that are described by noise so that the models are stochastic. If some distributions or parameters in the models are unknown then these control problems can be considered as problems of stochastic adaptive control. In this paper, three unknown ergodic Markov models are considered. The models are a discrete time controlled Markov process with complete observations, a controlled diffusion process with complete observations, and a discrete time controlled Markov process with partial observations. The discrete time Markov processes evolve in a compact state space, and the transition densities depend on an unknown parameter. The partial observations of the discrete time Markov process in the third model have the special form of complete observations in a fixed recurrent set and noisy observations in its complement. The controlled diffusion is described by a stochastic differential equation where the unknown parameter appears in the drift vector. The solution of the stochastic differential equation is given in the weak sense. Since there are some basic differences among these three models, it is convenient to treat them separately. Typically, the results that are given here are stated for each of the three models. Since the true value of the parameter is unknown, it is estimated using the maximum likelihood procedure where the time differences between the successive updates of the estimates are sufficiently large so that an ergodic property of the information and the cost can be used. Since only almost self-optimality is desired, the maximum likelihood procedure is restricted to choosing from a finite set of possible values for *
doi:10.1137/s0363012996298369 fatcat:cixi2d4tdrailfw7fbsipmivja