A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Sequential Decision Making with Limited Observation Capability: Application to Wireless Networks
[article]
2019
arXiv
pre-print
This work studies a generalized class of restless multi-armed bandits with hidden states and allow cumulative feedback, as opposed to the conventional instantaneous feedback. We call them lazy restless bandits (LRB) as the events of decision-making are sparser than events of state transition. Hence, feedback after each decision event is the cumulative effect of the following state transition events. The states of arms are hidden from the decision-maker and rewards for actions are state
arXiv:1801.01301v2
fatcat:f3dl4tl2gfh4bbmrp54eprvbru