20 Hits in 4.1 sec

Indexability and whittle index for restless bandit problems involving reset processes

Keqin Liu, Richard Weber, Qing Zhao
2011 IEEE Conference on Decision and Control and European Control Conference  
Restless MAB Restless Multi-Armed Bandits (RMAB) (Whittle'88) ◮ Passive arms also change states and offer reward. ◮ Activate K (K < N ) arms simultaneously at each time.  ...  K/N → 0. ◮ For Markov processes (non-interactive application), Whittle index policy is optimal for any K and N while both K and p i1 (1) can be time varying (inhomogeneous Markov chains). ◮ Dynamic Access  ...  Conclusion ◮ An RMAB formulation of reset processes. ◮ Indexability and closed-form Whittle index. ◮ Asymptotic optimality for statistically identical arms. ◮ Optimality for general inhomogeneous Markov  ... 
doi:10.1109/cdc.2011.6160533 dblp:conf/cdc/LiuWZ11 fatcat:k4sdw4gccncuxpkgcndzqjvrvm

Collapsing Bandits and Their Application to Public Health Interventions [article]

Aditya Mate, Jackson A. Killian, Haifeng Xu, Andrew Perrault, Milind Tambe
2020 arXiv   pre-print
Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable.  ...  We propose and study Collpasing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the  ...  Akbarzadeh and Mahajan [2] define a class of bandits with "controlled restarts," giving indexability results and a method for computing the Whittle index.  ... 
arXiv:2007.04432v1 fatcat:wljwgadqsjbt7hbu5d25bijnfm

Whittle Index Policy for Crawling Ephemeral Content

Konstantin E. Avrachenkov, Vivek S. Borkar
2018 IEEE Transactions on Control of Network Systems  
Fortunately, this problem admits a Whittle index, which leads to problem decomposition and to a very simple and efficient crawling policy.  ...  We derive the Whittle index and provide its theoretical justification.  ...  Prokhorenkova and Egor Samosvat from Yandex during the preparation of the manuscript.  ... 
doi:10.1109/tcns.2016.2619066 fatcat:s54e47oombbhlbvyj5olpqkbea

Whittle Index Policy for Crawling Ephemeral Content [article]

Konstantin Avrachenkov
2015 arXiv   pre-print
Fortunately, this problem admits a Whittle index, which leads to problem decomposition and to a very simple and efficient crawling policy.  ...  We derive the Whittle index and provide its theoretical justification.  ...  Prokhorenkova and Egor Samosvat from Yandex during the preparation of the manuscript.  ... 
arXiv:1503.08558v1 fatcat:xg5icmdrtzc3jhxijagrff7w6y

Thompson Sampling in Non-Episodic Restless Bandits [article]

Young Hun Jung, Marc Abeille, Ambuj Tewari
2019 arXiv   pre-print
Restless bandit problems assume time-varying reward distributions of the arms, which adds flexibility to the model but makes the analysis more challenging.  ...  Our algorithm adapts the TSDE algorithm of Ouyang et al. (2017) in a non-trivial manner to account for the special structure of restless bandits.  ...  Acknowledgements AT and YJ acknowledge the support of NSF CAREER grant IIS-1452099. AT was also supported by a Sloan Research Fellowship.  ... 
arXiv:1910.05654v1 fatcat:4qayr3bp7jeuvezw4scv3a3c5i

Optimality of myopic policy for a class of monotone affine restless multi-armed bandits

Parisa Mansourifard, Tara Javidi, Bhaskar Krishnamachari
2012 2012 IEEE 51st IEEE Conference on Decision and Control (CDC)  
We formulate a general class of restless multiarmed bandits with n independent and stochastically identical arms. Each arm is in a real-valued state s ∈ [s0, smax].  ...  We prove that if τ (s), p(s), and R(s) are all monotonically increasing affine functions, and τ (s) is a contraction mapping, the simple myopic policy, which selects at each time the arm with the highest  ...  Whittle showed that for RMAB, an index policy is not in general optimal.  ... 
doi:10.1109/cdc.2012.6425858 dblp:conf/cdc/MansourifardJK12 fatcat:4ymuulygp5h6vblmb5k3oirqyu

Scalable Delay-Sensitive Polling of Sensors

Hootan Rashtian, Bader Naim Alahmad, Sathish Gopalakrishnan
2020 IEEE Access  
Our problem formulation and its solution relate to the restless bandit model for sequential decision making.  ...  Whereas existing methods for the restless bandit model are not directly applicable because the state space is continuous and not discrete, we prove that similar techniques can be used because of particular  ...  Our problem model is related to the restless bandit model for sequential decision making [26] due to Whittle.  ... 
doi:10.1109/access.2020.3026237 fatcat:6x5fsg6lozh6vmbncr4crevfam

An Online Learning Approach to Optimizing Time-Varying Costs of AoI [article]

Vishrant Tripathi, Eytan Modiano
2021 arXiv   pre-print
The algorithm and its regret analysis are novel and of independent interest to the study of online restless multi-armed bandit problems.  ...  For the multiple source scheduling problem, we design a new online learning algorithm called Follow-the-Perturbed-Whittle-Leader and show that it has low regret compared to the best fixed scheduling policy  ...  Here, analyzing regret is especially challenging due to the combinatorial nature of the scheduling problem and since the Whittle index is only an approximately optimal solution for the offline problem.  ... 
arXiv:2105.13383v1 fatcat:5izsh7s5ibf7zbjn2viah6bcoe

Index Policy for A Class of Partially Observable Markov Decision Processes [article]

Keqin Liu
2021 arXiv   pre-print
This paper addresses an important class of restless multi-armed bandit (RMAB) problems that finds a broad application area in operations research, stochastic optimization, and reinforcement learning.  ...  There are N independent Markov processes that may be operated, observed and offer rewards.  ...  RMAB Formulation and Whittle Index In this section, we will formulate the multi-armed bandit problem as a partially observable Markov decision process and introduce the concept of Whittle Index.  ... 
arXiv:2107.11939v2 fatcat:ow6ucb3eljfodfwpq7t36t3dyu

Computation and Communication Co-Design for Real-Time Monitoring and Control in Multi-Agent Systems [article]

Vishrant Tripathi, Luca Ballotta, Luca Carlone, Eytan Modiano
2021 arXiv   pre-print
We develop efficient resource allocation algorithms using the Whittle index approach and demonstrate our proposed algorithms in two practical applications: multi-agent occupancy grid mapping in time-varying  ...  We investigate the problem of co-designing computation and communication in a multi-agent system (e.g. a sensor network or a multi-robot team).  ...  The Whittle index approach consists of four steps: 1) converting the problem into an equivalent restless multi-armed bandit (RMAB) formulation, 2) decoupling the problem via a Lagrange relaxation, 3) establishing  ... 
arXiv:2108.03122v2 fatcat:m4o6yk6r7jgddaakiwrtfhaylm

Flow Sampling: Network Monitoring in Large-Scale Software-Defined IoT Networks [article]

Yulin Shao, Soung Chang Liew, He Chen, Yuyang Du
2021 arXiv   pre-print
This paper formulates the flow sampling problem in large-scale SDIoT networks by means of a Markov decision process and devises policies that strike a good balance between these two goals.  ...  Three classes of policies are investigated: the optimal policy, the state-independent policies, and the index policies (including the Whittle index and a second-order index policies).  ...  The Whittle index [20] refers to an index policy proposed by Whittle to solve restless multi-armed bandit (RMAB) problems [21] .  ... 
arXiv:2007.10660v3 fatcat:y2vnj2nrgvhwdmews6dflbn5r4

Robust Restless Bandits: Tackling Interval Uncertainty with Deep Reinforcement Learning [article]

Jackson A. Killian, Lily Xu, Arpita Biswas, Milind Tambe
2021 arXiv   pre-print
We introduce Robust Restless Bandits, a challenging generalization of restless multi-arm bandits (RMAB). RMABs have been widely studied for intervention planning with limited resources.  ...  To make RMABs more useful in settings with uncertain dynamics: (i) We introduce the Robust RMAB problem and develop solutions for a minimax regret objective when transitions are given by interval uncertainties  ...  Acknowledgments and Disclosure of Funding  ... 
arXiv:2107.01689v1 fatcat:zkenanwvcvdj3a2xiljx6s7gy4

A Last Switch Dependent Analysis of Satiation and Seasonality in Bandits [article]

Pierre Laforgue, Giulia Clerici, Nicolò Cesa-Bianchi, Ran Gilad-Bachrach
2022 arXiv   pre-print
Motivated by the fact that humans like some level of unpredictability or novelty, and might therefore get quickly bored when interacting with a stationary policy, we introduce a novel non-stationary bandit  ...  Building upon the Combinatorial Semi-Bandits (CSB) framework, we design an algorithm and prove a bound on its regret with respect to the optimal non-stationary policy (which is NP-hard to compute).  ...  48 research and innovation action under grant agreement 951847, project ELISE (European Learning and Intelligent Systems Excellence).  ... 
arXiv:2110.11819v5 fatcat:s4p3sm3455e4ticve53e24maku

Non-Stationary Bandits with Habituation and Recovery Dynamics [article]

Yonatan Mintz, Anil Aswani, Philip Kaminsky, Elena Flowers, Yoshimi Fukuoka
2019 arXiv   pre-print
Many settings involve sequential decision-making where a set of actions can be chosen at each time step, each action provides a stochastic reward, and the distribution for the reward of each action is  ...  Though finding an optimal policy for general models with non-stationarity is PSPACE-complete, we propose and analyze a new class of models called ROGUE (Reducing or Gaining Unknown Efficacy) bandits, which  ...  Acknowledgments The authors gratefully acknowledge the support of NSF Award CMMI-1450963, UCSF Diabetes Family Fund for Innovative Patient Care-Education and Scientific Discovery Award, K23 Award (NR011454  ... 
arXiv:1707.08423v3 fatcat:amyaudusd5c6fevwi2vo7neyny

Age of Information: An Introduction and Survey [article]

Roy D. Yates, Yin Sun, D. Richard Brown III, Sanjit K. Kaul, Eytan Modiano, Sennur Ulukus
2020 arXiv   pre-print
We also explore how update age is related to MMSE methods of sampling, estimation and control of stochastic processes.  ...  various single-hop and multi-hop wireless networks.  ...  Whittle's Index policy is the optimal solution to a relaxation of the Restless Multi-Armed Bandit (RMAB) problem.  ... 
arXiv:2007.08564v1 fatcat:l7ctda3ukfge5hqtbk3ez7pjia
« Previous Showing results 1 — 15 out of 20 results