Large-Scale Markov Decision Problems with KL Control Cost and its Application to Crowdsourcing

Yasin Abbasi-Yadkori, Peter L. Bartlett, Xi Chen, Alan Malek
2015 International Conference on Machine Learning  
We study average and total cost Markov decision problems with large state spaces. Since the computational and statistical cost of finding the optimal policy scales with the size of the state space, we focus on searching for near-optimality in a low-dimensional family of policies. In particular, we show that for problems with a Kullback-Leibler divergence cost function, we can recast policy optimization as a convex optimization and solve it approximately using a stochastic subgradient algorithm.
more » ... This method scales in complexity with the family of policies but not the state space. We show that the performance of the resulting policy is close to the best in the lowdimensional family. We demonstrate the efficacy of our approach by optimizing a policy for budget allocation in crowd labeling, an important crowdsourcing application.
dblp:conf/icml/Abbasi-YadkoriB15 fatcat:7fb6h7blifeepauhapd6k77omi