A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Dynamic Spectrum Access in Time-varying Environment: Distributed Learning Beyond Expectation Optimization
[article]
2017
arXiv
pre-print
Therefore, we formulate the interactions among the users in the time-varying environment as a non-cooperative game, in which the utility function is defined as the achieved effective capacity. ...
This article investigates the problem of dynamic spectrum access for canonical wireless networks, in which the channel states are time-varying. ...
access game in time-varying environment. ...
arXiv:1502.06672v4
fatcat:ehiz6u5ep5ghnlhxvs7h7imozi
MDP Playground: A Design and Debug Testbed for Reinforcement Learning
[article]
2021
arXiv
pre-print
We present MDP Playground, an efficient testbed for Reinforcement Learning (RL) agents with orthogonal dimensions that can be controlled independently to challenge agents in different ways and obtain varying ...
degrees of hardness in generated environments. ...
The underlying assumptions in many of these environments are that of a Markov Decision Process (MDP) [see, e.g., Puterman, 1994, Sutton and Barto, 2018] or a Partially Observable MDP (POMDP) [see, e.g ...
arXiv:1909.07750v4
fatcat:wcj7j7cxqzhdzb5t2hicms7yzy
Active Reinforcement Learning over MDPs
[article]
2021
arXiv
pre-print
This paper proposes a framework of Active Reinforcement Learning (ARL) over MDPs to improve generalization efficiency in a limited resource by instance selection. ...
However, one of the greatest challenges in RL is generalization efficiency (i.e., generalization performance in a unit time). ...
In our context, active reinforcement learning (ARL) over MDPs decides which instances (i.e., MDPs) to train and save the training cost. ...
arXiv:2108.02323v3
fatcat:cmcn36kiyvffdkprzbdb252tyy
Learning Robust State Abstractions for Hidden-Parameter Block MDPs
[article]
2021
arXiv
pre-print
In this work, we leverage ideas of common structure from the HiP-MDP setting, and extend it to enable robust state abstractions inspired by Block MDPs. ...
Hidden-Parameter Markov Decision Processes (HiP-MDPs) explicitly model this structure to improve sample efficiency in multi-task settings. ...
Figure 1 : 1 Visualizations of the typical MTRL setting and the HiP-MDP setting.
1 . 1 Cartpole-Swingup-V0: the mass of the pole varies, 2. Cheetah-Run-V0: the size of the torso varies, 3. ...
arXiv:2007.07206v4
fatcat:mdp3x6s6ovf5znhb2oiv56y5fy
Block Contextual MDPs for Continual Learning
[article]
2021
arXiv
pre-print
In reinforcement learning (RL), when defining a Markov Decision Process (MDP), the environment dynamics is implicitly assumed to be stationary. ...
In this work, we propose to examine this continual reinforcement learning setting through the block contextual MDP (BC-MDP) framework, which enables us to relax the assumption of stationarity. ...
Finally, we discuss additional related works in multi-task RL, transfer learning, and MDP metrics in Appendix A. ...
arXiv:2110.06972v1
fatcat:cqmjbgeynbacjkeirikksjimtq
Optimizing for the Future in Non-Stationary MDPs
[article]
2020
arXiv
pre-print
However, in many real-world applications, this assumption is violated, and using existing algorithms may result in a performance lag. ...
Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary. ...
Learning and planning for
time-varying mdps using maximum likelihood estima-
tion. arXiv preprint arXiv:1911.12976, 2019.
Padakandla, S. ...
arXiv:2005.08158v4
fatcat:er42kn4d2bbsni6xqvkni37jli
Online Reinforcement Learning for Periodic MDP
[article]
2022
arXiv
pre-print
We study learning in periodic Markov Decision Process(MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average ...
We formulate the problem as a stationary MDP by augmenting the state space with the period index, and propose a periodic upper confidence bound reinforcement learning-2 (PUCRL2) algorithm. ...
This was first analysed in [7] in a solely reward varying environment. ...
arXiv:2207.12045v1
fatcat:tk5tgz2ylvaz3csvba63b72ggu
Decentralized MDPs with sparse interactions
2011
Artificial Intelligence
Finally, we show a reinforcement learning algorithm in which independent agents learn both individual policies and when and how to coordinate. ...
We relate our new model to other existing models such as MMDPs and Dec-MDPs. ...
We run our learning algorithm in each of the test environments. Table 7 summarizes the number of learning steps allowed in each environments. ...
doi:10.1016/j.artint.2011.05.001
fatcat:k2theoe5qvf3pbrij37jmjafky
Invariant Causal Prediction for Block MDPs
[article]
2020
arXiv
pre-print
In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but ...
varying observations. ...
The authors would also like to thank Marlos Machado for helpful feedback in the writing process. ...
arXiv:2003.06016v2
fatcat:hnqf7cfkergp3fsaoi6lbsxa6u
Learning and Planning for Time-Varying MDPs Using Maximum Likelihood Estimation
[article]
2021
arXiv
pre-print
This paper proposes a formal approach to online learning and planning for agents operating in a priori unknown, time-varying environments. ...
Based on the proposed method, we generalize the exploration bonuses used in learning for time-invariant Markov decision processes by introducing a notion of uncertainty in a learned time-varying model, ...
In contrast to assuming time-invariance, accounting for time-varying changes in the environment presents a major challenge to learning and planning. ...
arXiv:1911.12976v2
fatcat:kqwcpk7iejctlegk5o4cds5sve
Inverse Reinforcement Learning in Contextual MDPs
[article]
2020
arXiv
pre-print
We consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). ...
Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them ...
Apprenticeship Learning and Inverse Reinforcement Learning In Apprenticeship Learning (AL), the reward function is unknown, and we denote the MDP without the reward function (also commonly called a controlled ...
arXiv:1905.09710v5
fatcat:tluul5ast5dedk4nxsbpevr27a
Safety-Constrained Reinforcement Learning for MDPs
[article]
2015
arXiv
pre-print
We consider controller synthesis for stochastic and partially unknown environments in which safety is essential. ...
Exploiting an iterative learning procedure, the resulting policy is safety-constrained and optimal. ...
Learning In the learning phase, the main goal of this learning phase is the exploration of this MDP, as we thereby learn the cost function. ...
arXiv:1510.05880v1
fatcat:bqjtjzv7kngkxm2z4c7zgxzji4
Bayesian regularization of empirical MDPs
[article]
2022
arXiv
pre-print
When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling for solutions with better generalization performance. ...
Our results demonstrate the robustness of regularized MDP policies against the noise present in the models. ...
When applying π directly to the environment modeled by the underlying MDP M , one often experiences suboptimal performance. ...
arXiv:2208.02362v1
fatcat:5g3fnekgbvgabatwxmjrufd3ta
Denoised MDPs: Learning World Models Better Than the World Itself
[article]
2022
arXiv
pre-print
This framework clarifies the kinds information removed by various prior work on representation learning in reinforcement learning (RL), and leads to our proposed approach of learning a Denoised MDP that ...
In this work, we categorize information out in the wild into four types based on controllability and relation with reward, and formulate useful information as that which is both controllable and reward-relevant ...
We are very thankful to Alex Lamb for suggestions and catching our typo in the conditioning of Equation (1). ...
arXiv:2206.15477v4
fatcat:he66y45mgfcjvp6l6d253hzkn4
Bounded Optimal Exploration in MDP
[article]
2016
arXiv
pre-print
In this paper, we relax the PAC-MDP conditions to reconcile theoretically driven exploration methods and practical needs. ...
Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning ...
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of our sponsors. ...
arXiv:1604.01350v1
fatcat:h2szqltoknb2tbgpusrd7bdthq
« Previous
Showing results 1 — 15 out of 12,366 results