A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Gambler's Ruin Bandit Problem
2016
2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
In this paper, we propose a new multi-armed bandit problem called the Gambler's Ruin Bandit Problem (GRBP). In the GRBP, the learner proceeds in a sequence of rounds, where each round is a Markov Decision Process (MDP) with two actions (arms): a continuation action that moves the learner randomly over the state space around the current state; and a terminal action that moves the learner directly into one of the two terminal states (goal and dead-end state). The current round ends when a
doi:10.1109/allerton.2016.7852376
dblp:conf/allerton/AkbarzadehT16
fatcat:loqffnpyq5g5hp4byvah6xgj7m