A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2012; you can also visit the original URL.
The file type is
Considered are semi-Markov decision processes (SMDPs) with finite state and action spaces. We study two criteria: the expected average reward per unit time subject to a sample path constraint on the average cost per unit time and the expected time-average variability. Under a certain condition, for communicating SMDPs, we construct (randomized) stationary policies that are e-optimal for each criterion; the policy is optimal for the first criterion under the unichain assumption and the policy isdoi:10.1017/s026996480700037x fatcat:djdhhjvqyjan7epuuz3p53fa6e