Filters








637 Hits in 4.1 sec

Adaptive Strategies and Regret Minimization in Arbitrarily Varying Markov Environments [chapter]

Shie Mannor, Nahum Shimkin
<span title="">2001</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/3-540-44581-1_9">doi:10.1007/3-540-44581-1_9</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/eg67nyigbrdnxba2gocvbmubyu">fatcat:eg67nyigbrdnxba2gocvbmubyu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190303141504/http://pdfs.semanticscholar.org/b6cf/15de0b415a071ab93d9702ff104e2c812938.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/b6/cf/b6cf15de0b415a071ab93d9702ff104e2c812938.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/3-540-44581-1_9"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Online Markov Decision Processes With Kullback–Leibler Control Cost

Peng Guan, Maxim Raginsky, Rebecca M. Willett
<span title="">2014</span> <i title="Institute of Electrical and Electronics Engineers (IEEE)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/tiaci7xy45hczhz755zlpu5h7q" style="color: black;">IEEE Transactions on Automatic Control</a> </i> &nbsp;
We give an explicit construction of an efficient strategy that has small regret (i.e., the difference between the total state-action cost incurred causally and the smallest cost attainable using noncausal  ...  The online aspect of the problem is due to the fact that the state cost functions are generated by a dynamic environment, and the agent learns the current state cost only after having selected the corresponding  ...  The time-varying cost functions may represent unmodeled aspects of the environment or collective (and possibly irrational) behavior of any other agents that may be present; the regret minimization viewpoint  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/tac.2014.2301558">doi:10.1109/tac.2014.2301558</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/xspyzlcaubfzhlqyn3e34wjbaq">fatcat:xspyzlcaubfzhlqyn3e34wjbaq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170705075125/http://maxim.ece.illinois.edu/pubs/guan_raginsky_willett_ACC12.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/2c/51/2c5120a92893291f9b62731d50bd84b047df6bb9.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/tac.2014.2301558"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

Online Markov decision processes with Kullback-Leibler control cost

Peng Guan, M. Raginsky, R. Willett
<span title="">2012</span> <i title="IEEE"> 2012 American Control Conference (ACC) </i> &nbsp;
We give an explicit construction of an efficient strategy that has small regret (i.e., the difference between the total state-action cost incurred causally and the smallest cost attainable using noncausal  ...  The online aspect of the problem is due to the fact that the state cost functions are generated by a dynamic environment, and the agent learns the current state cost only after having selected the corresponding  ...  The time-varying cost functions may represent unmodeled aspects of the environment or collective (and possibly irrational) behavior of any other agents that may be present; the regret minimization viewpoint  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/acc.2012.6314926">doi:10.1109/acc.2012.6314926</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/6ioaqlpirnhzrovgzgniths5rm">fatcat:6ioaqlpirnhzrovgzgniths5rm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170705075125/http://maxim.ece.illinois.edu/pubs/guan_raginsky_willett_ACC12.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/2c/51/2c5120a92893291f9b62731d50bd84b047df6bb9.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/acc.2012.6314926"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

Joint iterative beamforming and power adaptation for MIMO ad hoc networks

Engin Zeydan, Didem Kivanc, Ufuk Tureli, Cristina Comaniciu
<span title="2011-08-26">2011</span> <i title="Springer Nature"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/a5gjzjadkzbirdua4xdqipotp4" style="color: black;">EURASIP Journal on Wireless Communications and Networking</a> </i> &nbsp;
Notice that in the regret-matching game, each user m chooses a strategy t m Δ m at any step with a probability proportional to the average regret for not choosing that strategy t m Δ m in the past steps  ...  The steady-state solution of the regret-matchingbased learning algorithm exhibits "no regret" and the probability of choosing a strategy is proportional to the player's "regret" for not having chosen other  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1186/1687-1499-2011-79">doi:10.1186/1687-1499-2011-79</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/os4zxeyrczfsnd2qwrtto7auq4">fatcat:os4zxeyrczfsnd2qwrtto7auq4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201104162326/https://researchrepository.wvu.edu/cgi/viewcontent.cgi?article=3689&amp;context=faculty_publications" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ab/5e/ab5ee8e1efb2dd743df750f151cf3b5a28ed169a.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1186/1687-1499-2011-79"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> springer.com </button> </a>

Online Markov decision processes with Kullback-Leibler control cost [article]

Peng Guan and Maxim Raginsky and Rebecca Willett
<span title="2014-01-14">2014</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
An explicit construction of a computationally efficient strategy with small regret (i.e., expected difference between its actual total cost and the smallest cost attainable using noncausal knowledge of  ...  The online aspect of the problem is due to the fact that the state cost functions are generated by a dynamic environment, and the agent learns the current state cost only after selecting an action.  ...  The regret of our adaptive strategy is thus given by C T − C T (P baseline ).  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1401.3198v1">arXiv:1401.3198v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/bbct44wmeba3dmz3fxdmfl2aua">fatcat:bbct44wmeba3dmz3fxdmfl2aua</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200829233301/https://arxiv.org/pdf/1401.3198v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/4f/d5/4fd5f80494329e84d3c96093e13a3e24712d08d9.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1401.3198v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Relax but stay in control: from value to algorithms for online Markov decision processes [article]

Peng Guan, Maxim Raginsky, Rebecca Willett
<span title="2015-08-31">2015</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Moreover, the presence of the state and the assumption of an arbitrarily varying environment complicate both the theoretical analysis and the development of computationally efficient methods.  ...  State-based models are common in stochastic control settings, but commonly used frameworks such as Markov Decision Processes (MDPs) assume a known stationary environment.  ...  Alexander Rakhlin and Karthik Sridharan for helping us construct the relaxation presented in Section 4.2.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1310.7300v2">arXiv:1310.7300v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wgbuuibktnfr3jqblrv6mc6e4i">fatcat:wgbuuibktnfr3jqblrv6mc6e4i</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200831000812/https://arxiv.org/pdf/1310.7300v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/7d/6c/7d6caa0ddda15a1d04cf9071d37637cc5d446fd5.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1310.7300v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Decentralized dynamic spectrum access for cognitive radios: cooperative design of a non-cooperative game

Michael Maskery, Vikram Krishnamurthy, Qing Zhao
<span title="">2009</span> <i title="Institute of Electrical and Electronics Engineers (IEEE)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/dha3bnxvkvhslj74k332qi2ewa" style="color: black;">IEEE Transactions on Communications</a> </i> &nbsp;
For both slowly varying primary user activity and slowly varying statistics of "fast" primary user activity, we apply an adaptive regret based learning procedure which tracks the set of correlated equilibria  ...  Spectrum-agile cognitive radios compete for channels temporarily vacated by licensed primary users in order to satisfy their own demands while minimizing interference.  ...  Performance of Regret Tracking in a Dynamic Environment Here we analyze the performance of Algorithm 5.1 with constant stepsize ε = 0.1 when the primary user activity and radio demands vary in time.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/tcomm.2009.02.070158">doi:10.1109/tcomm.2009.02.070158</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/p3gobctj4zelblr3hmdtqn5uri">fatcat:p3gobctj4zelblr3hmdtqn5uri</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20100614193849/http://www.ece.ucdavis.edu/~qzhao/MaskeryEtal09COM.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/6e/26/6e26ce5d3ab56ea7ad29cc898e43809d08da1073.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/tcomm.2009.02.070158"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

Constrained Contextual Bandit Learning for Adaptive Radar Waveform Selection [article]

Charles E. Thornton, R. Michael Buehrer, Anthony F. Martone
<span title="2021-06-14">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Stochastic and adversarial linear contextual bandit models are introduced, allowing the radar to achieve effective performance in broad classes of physical environments.  ...  A sequential decision process in which an adaptive radar system repeatedly interacts with a finite-state target channel is studied.  ...  Further, while RL techniques generally perform well in cases where the environment obeys the Markov property [38] , interference in wireless networks is often a time-varying stochastic process, where  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.05541v2">arXiv:2103.05541v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/bvjexjasbvfbfbdobr4olshequ">fatcat:bvjexjasbvfbfbdobr4olshequ</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210617014935/https://arxiv.org/pdf/2103.05541v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/67/58/6758e4f5f394094ee57615e9b49c871c07b5b127.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.05541v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Page 8146 of Mathematical Reviews Vol. , Issue 2004j [page]

<span title="">2004</span> <i title="American Mathematical Society"> <a target="_blank" rel="noopener" href="https://archive.org/details/pub_mathematical-reviews" style="color: black;">Mathematical Reviews </a> </i> &nbsp;
in arbitrarily varying Markov environments (128 142); John Case, Sanjay Jain [Sanjay Jain'], Frank Stephan and Rolf Wiehagen, Robust learning—rich and poor (143-159).  ...  in Markov environments (616-629).  ... 
<span class="external-identifiers"> </span>
<a target="_blank" rel="noopener" href="https://archive.org/details/sim_mathematical-reviews_2004-10_2004j/page/8146" title="read fulltext microfilm" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Archive [Microfilm] <div class="menu fulltext-thumbnail"> <img src="https://archive.org/serve/sim_mathematical-reviews_2004-10_2004j/__ia_thumb.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a>

Decentralized Online Convex Programming with local information

Maxim Raginsky, Nooshin Kiarashi, Rebecca Willett
<span title="">2011</span> <i title="IEEE"> Proceedings of the 2011 American Control Conference </i> &nbsp;
The proposed algorithm yields small regret (i.e., the difference between the total cost incurred using causally available information and the total cost that would have been incurred in hindsight had all  ...  At each stage, the agents face a new objective function that reflects the effects of a changing environment, and each agent can share information pertaining to past decisions and cost functions only with  ...  This problem has several salient features: • Time-varying objective functionsthe quantity to be optimized varies with time due to uncertain environment dynamics, and the agents must adapt to these variations  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/acc.2011.5991212">doi:10.1109/acc.2011.5991212</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/3a3tf5rrzzgzfbvrqqyabh7vam">fatcat:3a3tf5rrzzgzfbvrqqyabh7vam</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170808072803/http://maxim.ece.illinois.edu/pubs/raginsky_kiarashi_willett_ACC2011.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/67/65/6765fb79c67f5008ae1721e773ea43f9eca4dedc.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/acc.2011.5991212"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

Robust Restless Bandits: Tackling Interval Uncertainty with Deep Reinforcement Learning [article]

Jackson A. Killian, Lily Xu, Arpita Biswas, Milind Tambe
<span title="2021-07-04">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
To make RMABs more useful in settings with uncertain dynamics: (i) We introduce the Robust RMAB problem and develop solutions for a minimax regret objective when transitions are given by interval uncertainties  ...  RMABPPO hinges on learning an auxiliary "λ-network" that allows each arm's learning to decouple, greatly reducing sample complexity required for training; (iv) Under minimax regret, the adversary in the  ...  Acknowledgments and Disclosure of Funding  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2107.01689v1">arXiv:2107.01689v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/zkenanwvcvdj3a2xiljx6s7gy4">fatcat:zkenanwvcvdj3a2xiljx6s7gy4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210710001136/https://arxiv.org/pdf/2107.01689v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ea/45/ea457940341470b30b6a5e4c60289bc8f8da071e.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2107.01689v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Online discrete optimization in social networks in the presence of Knightian uncertainty [article]

Maxim Raginsky, Angelia Nedić
<span title="2015-01-29">2015</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Moreover, agents have inertia: each agent has a default mixed strategy that stays fixed regardless of the state of the environment, and must expend effort to deviate from this strategy in order to respond  ...  We show that our strategy achieves the regret that scales polylogarithmically with the time horizon and polynomially with the number of agents and the maximum number of neighbors of any agent in the social  ...  The amount of traffic on each edge e ∈ E varies with time arbitrarily.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1307.0473v2">arXiv:1307.0473v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/rujj7vmezjh73g6u6kd3jb4yne">fatcat:rujj7vmezjh73g6u6kd3jb4yne</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200825222145/https://arxiv.org/pdf/1307.0473v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/de/51/de51ed0b882824907e277f495a391e020cc9a4d6.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1307.0473v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments [article]

Sindhu Padakandla
<span title="2020-05-19">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
RL agents in these applications often need to react and adapt to changing operating conditions.  ...  This paper provides a survey of RL methods developed for handling dynamically varying environment models.  ...  ., poor learning efficiency in non-stationary environments. Online-learning based variant of QL for arbitrarily varying reward and transition probability functions in MDPs is proposed by [31] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.10619v1">arXiv:2005.10619v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/35rikwhrwvcf7pvvn7rcokgzxq">fatcat:35rikwhrwvcf7pvvn7rcokgzxq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200528222627/https://arxiv.org/pdf/2005.10619v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.10619v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Toward Optimal Adaptive Wireless Communications in Unknown Environments

Pan Zhou, Tao Jiang
<span title="">2016</span> <i title="Institute of Electrical and Electronics Engineers (IEEE)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ra4txhbrwnbwzk3lwinyjovpbe" style="color: black;">IEEE Transactions on Wireless Communications</a> </i> &nbsp;
In this paper, we propose an adaptive channel access algorithm for wireless communications in unknown environments based on the theory of multi-armed bandits (MAB) problems.  ...  The quantitative performance studies indicate the superior throughput gain and the flexibility of our algorithm in practice, which is resilient to both oblivious and adaptive jamming attacks with different  ...  The goal of the algorithm is to minimize the regret.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/twc.2016.2524638">doi:10.1109/twc.2016.2524638</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/zxnjj7jveval5i4foytryb2l3a">fatcat:zxnjj7jveval5i4foytryb2l3a</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191021084633/https://arxiv.org/pdf/1505.06608v7.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/5f/2b/5f2b45f78f6508ee63fb8e0c78d0fd46aa780cf9.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/twc.2016.2524638"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

Provable Self-Play Algorithms for Competitive Reinforcement Learning [article]

Yu Bai, Chi Jin
<span title="2020-07-09">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We introduce a self-play algorithm—Value Iteration with Upper/Lower Confidence Bound (VI-ULCB)—and show that it achieves regret Õ(√(T)) after playing T steps of the game, where the regret is measured by  ...  We study self-play in competitive reinforcement learning under the setting of Markov games, a generalization of Markov decision processes to the two-player case.  ...  We also thank the Simons Institute at Berkeley and its Foundations of Deep Learning program in Summer 2019 for hosting the authors and incubating our initial discussions.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2002.04017v3">arXiv:2002.04017v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/r36v54jlfrho7jsgdfrsjqx3zq">fatcat:r36v54jlfrho7jsgdfrsjqx3zq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200910052231/https://arxiv.org/pdf/2002.04017v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/53/2a/532a8221a8a2d38941e80f0661a479c4fd3671e9.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2002.04017v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 637 results