Beyond Markov Decision Process with Scalar Markovian Rewards

Shuwa Miura
2022 Proceedings of the International Symposium on Combinatorial Search  
Real-world decision problems often involve multiple competing objectives or a complex reward structure that violate Markov assumption. However, the existing research on sequential decision making under uncertainty primarily focused on Markov Decision Processes (MDPs) with scalar Markovian reward signals. My thesis considers settings where scalar Markovian rewards are not sufficient to produce desired behaviors. The first part of my thesis develops algorithms to optimize lexicographically
more » ... objectives. The second part considers autonomous agents which incorporate the perspective of their observer. As the perspective of the observer can depend on how the agents behaved so far, rewards in this setting can depend on histories (non-Markovian). In the final part of my thesis, I hope to characterize when rewards beyond scalar Markovian signals are needed from the decision theoretic perspective
doi:10.1609/socs.v15i1.21805 fatcat:i236hwkmsva2dhhksbjbmpo6mu