A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
[article]
2021
arXiv
pre-print
We study the stochastic shortest path problem with adversarial costs and known transition, and show that the minimax regret is O(√(DT^⋆ K)) and O(√(DT^⋆ SA K)) for the full-information setting and the bandit feedback setting respectively, where D is the diameter, T^⋆ is the expected hitting time of the optimal policy, S is the number of states, A is the number of actions, and K is the number of episodes. Our results significantly improve upon the existing work of (Rosenberg and Mansour, 2020)
arXiv:2012.04053v3
fatcat:2kvjqo6ehvh75hcubaw7m3hrde