Filters








937 Hits in 3.9 sec

Policy Smoothing for Provably Robust Reinforcement Learning [article]

Aounon Kumar, Alexander Levine, Soheil Feizi
2022 arXiv   pre-print
The study of provable adversarial robustness for deep neural networks (DNNs) has mainly focused on static supervised learning tasks such as image classification.  ...  Prior works in provable robustness in RL seek to certify the behaviour of the victim policy at every time-step against a non-adaptive adversary using methods developed for the static setting.  ...  CONCLUSION In this work, we extend randomized smoothing to design a procedure that can make any reinforcement learning agent provably robust against adversarial attacks without significantly increasing  ... 
arXiv:2106.11420v3 fatcat:toalxmperncqbi4sswsrmkkpqu

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing [article]

Rui Yang, Chenjia Bai, Xiaoteng Ma, Zhaoran Wang, Chongjie Zhang, Lei Han
2022 arXiv   pre-print
To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.  ...  However, such conservatism impairs the robustness of learned policies, leading to a significant change even for a small perturbation on observations.  ...  Conclusion We propose Robust Offline Reinforcement Learning (RORL) to trade-off conservatism and robustness for offline RL.  ... 
arXiv:2206.02829v1 fatcat:rqphd4nb5fcx3c2fklhef2mkdi

On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems

Kaiqing Zhang, Bin Hu, Tamer Basar
2020 Neural Information Processing Systems  
Reinforcement learning (RL) algorithms can fail to generalize due to the gap between the simulation and the real world.  ...  One standard remedy is to use robust adversarial RL (RARL) that accounts for this gap during the policy training, by modeling the gap as an adversary against the training agent.  ...  Introduction Reinforcement learning (RL) can fail to generalize due to the gap between the simulation and the real world.  ... 
dblp:conf/nips/ZhangHB20 fatcat:secji7szmndhvbkbc4rcjmd3qq

Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning [article]

Sebastian Curi, Ilija Bogunovic, Andreas Krause
2021 arXiv   pre-print
spaces during policy learning.  ...  In real-world tasks, reinforcement learning (RL) agents frequently encounter situations that are not present during training time.  ...  We develop the Robust Hallucinated Upper-Confidence Reinforcement Learning (RH-UCRL) algorithm for obtaining robust RL policies.  ... 
arXiv:2103.10369v1 fatcat:nqr7bcaugnfprolnvy5cq6c2iu

Provably Safe Reinforcement Learning: A Theoretical and Experimental Comparison [article]

Hanna Krasowski, Jakob Thumm, Marlon Müller, Xiao Wang, Matthias Althoff
2022 arXiv   pre-print
Ensuring safety of reinforcement learning (RL) algorithms is crucial for many real-world tasks. However, vanilla RL does not guarantee safety for an agent.  ...  We therefore introduce a categorization for existing provably safe RL methods, and present the theoretical foundations for both continuous and discrete action spaces.  ...  Gros et al. (2020) discuss the effects on different learning algorithms when using robust MPC for projection.  ... 
arXiv:2205.06750v1 fatcat:6mkf42ygxzgfnl25e26jfusk64

Provably Robust Blackbox Optimization for Reinforcement Learning [article]

Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani
2019 arXiv   pre-print
RBO relies on learning gradient flows using robust regression methods to enable off-policy updates.  ...  art methods for policy optimization problems in Robotics.  ...  The first row presents the RBO policy learned for Hopper and the second row for Walker2d. Both policies lead to optimal behaviors and were learned with the use of only 100 perturbations per epoch.  ... 
arXiv:1903.02993v2 fatcat:yluxxhdirbg2zmurgpqvka6gnm

A Boosting Approach to Reinforcement Learning [article]

Nataly Brukhim, Elad Hazan, Karan Singh
2021 arXiv   pre-print
We study efficient algorithms for reinforcement learning in Markov decision processes whose complexity is independent of the number of states.  ...  We consider the methodology of boosting, borrowed from supervised learning, for converting weak learners into an accurate policy.  ...  In terms of provable methods for deep RL, there are two main lines of work. The first is a robust analysis of the policy gradient algorithm [2, 1] .  ... 
arXiv:2108.09767v1 fatcat:b7o3a2jdkvcp7pppsoyrfrwsku

A Case for Robust AI in Robotics

Shashank Pathak, Luca Pulina, Armando Tacchella
2015 International Conference of the Italian Association for Artificial Intelligence  
We posit that a reasonable mathematical model to frame such vision is that of Markov decision processes, and that ensuring smooth interactions amounts to endow robots with control policies that are provably  ...  A policy π is deterministic if it is a function π : S → A s , and is stochastic if it is a function π : S × A → [0, 1] with π(s, a) = 0 for all a / ∈ A s and s ∈ S, and a∈As π(s, a) = 1 for all s ∈ S.  ...  In the learning community -see [3] for a recent persective -reinforcement learning (RL) is viewed as one of the key techniques to synthesize intelligent behavior for interactive agents, and the mathematical  ... 
dblp:conf/aiia/PathakPT15a fatcat:4uxcvfh7irf6vkoxor4kuue5ou

A General Family of Robust Stochastic Operators for Reinforcement Learning [article]

Yingdong Lu and Mark S. Squillante and Chai Wah Wu
2019 arXiv   pre-print
We consider a new family of operators for reinforcement learning with the goal of alleviating the negative effects and becoming more robust to approximation or estimation errors.  ...  Exploiting these weaker conditions for optimality could lead to alternatives to the classical Bellman operator that improve convergence speed, accuracy and robustness in reinforcement learning, especially  ...  Conclusions We proposed and analyzed a new general family of robust stochastic operators for reinforcement learning, which subsumes the classical Bellman operator and a recently proposed family of operators  ... 
arXiv:1805.08122v2 fatcat:ajb3kl4znveqnotg6gj34iux7e

Enforcing robust control guarantees within neural network policies [article]

Priya L. Donti, Melrose Roderick, Mahyar Fazlyab, J. Zico Kolter
2021 arXiv   pre-print
the same provable robustness criteria as robust control.  ...  When designing controllers for safety-critical systems, practitioners often face a challenging tradeoff between robustness and performance.  ...  agreement between the National Science Foundation and Carnegie Mellon University (SES-00949710), the Computational Sustainability Network, and the Bosch Center for AI.  ... 
arXiv:2011.08105v2 fatcat:4dx5hd7fnba3bkvzvje7upizbe

Policy Gradient Method For Robust Reinforcement Learning [article]

Yue Wang, Shaofeng Zou
2022 arXiv   pre-print
Robust reinforcement learning is to learn a policy robust to model mismatch between simulator and real environment.  ...  This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch.  ...  In this paper, we develop the first policy gradient method for robust RL under model mismatch with provable robustness, global optimality and complexity analysis.  ... 
arXiv:2205.07344v1 fatcat:7mk6zwtx7vgsbf55kee6q7wttm

Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator [article]

Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi
2019 arXiv   pre-print
Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the  ...  underlying model 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest 3) they inherently allow for richly parameterized policies.  ...  K. thanks Emo Todorov, Aravind Rajeswaran, Kendall Lowrey, Sanjeev Arora, and Elad Hazan for helpful discussions. S. K. and M. F. also thank Ben Recht for helpful discussions. R.  ... 
arXiv:1801.05039v3 fatcat:hf7gpybbxnfkrbuhzladrgvkby

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor [article]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine
2018 arXiv   pre-print
In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework.  ...  Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks.  ...  Acknowledgments We would like to thank Vitchyr Pong for insightful discussions and help in implementing our algorithm as well as providing the DDPG baseline code; Ofir Nachum for offering support in running  ... 
arXiv:1801.01290v2 fatcat:5737bv4lmzdzxbv6xreow6phfy

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations [article]

Huan Zhang, Hongge Chen, Chaowei Xiao, Bo Li, Mingyan Liu, Duane Boning, Cho-Jui Hsieh
2021 arXiv   pre-print
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises.  ...  We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.  ...  Deep robust Kalman filter. arXiv preprint arXiv:1703.02310, 2017. [64] Shen, Q., Li, Y., Jiang, H., Wang, Z., and Zhao, T. Deep reinforcement learning with smooth policy.  ... 
arXiv:2003.08938v7 fatcat:64dqpsscovbfzm42rucdvbkvdy

Provably Efficient Maximum Entropy Exploration [article]

Elad Hazan, Sham M. Kakade, Karan Singh, Abby Van Soest
2019 arXiv   pre-print
For example, one natural, intrinsically defined, objective problem is for the agent to learn a policy which induces a distribution over state space that is as uniform as possible, which can be measured  ...  We provide an efficient algorithm to optimize such such intrinsically defined objectives, when given access to a black box planning oracle (which is robust to function approximation).  ...  The authors thank Shie Mannor for helpful discussions.  ... 
arXiv:1812.02690v2 fatcat:hct4ruiyifg3xdcct6fzyf4rge
« Previous Showing results 1 — 15 out of 937 results