Denumerable-Armed Bandits

Jeffrey S. Banks, Rangarajan K. Sundaram
1992 Econometrica  
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact This content downloaded from on FriThis paper studies the class of denumerable-armed (i.e. finite-or countably infinitearmed) bandit problems
more » ... independent arms and geometric discounting over an infinite horizon, in which each arm generates rewards according to one of a finite number of distributions, or "types." The number of types in the support of an arm, as also the types themselves, are allowed to vary across the arms. We derive certain continuity and curvature properties of the dynamic allocation (or Gittins) index of Gittins and Jones (1974) , and provide necessary and sufficient conditions under which the Gittins-Jones result identifying all optimal strategies for finite-armed bandits may be extended to infinite-armed bandits. We then establish our central result: at each point in time, the arm selected by an optimal strategy will, with strictly positive probability, remain an optimal selection forever. More specifically, for every such arm, there exists (at least) one type of that arm such that, when conditioned on that type being the arm's "true" type, the arm will survive forever and continuously with nonzero probability. When the reward distributions of an arm satisfy the monotone likelihood ratio property (MLRP), the survival prospects of an arm improve when conditioned on types generating higher expected rewards; however, we show how this need not be the case in the absence of MLRP. Implications of these results are derived for the theories of job search and matching, as well as other applications of the bandit paradigm.
doi:10.2307/2951539 fatcat:c7tkved7b5cs7dda7cjkrhj43m