1,842 Hits in 5.4 sec

Policy Gradient Bayesian Robust Optimization for Imitation Learning [article]

Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg
2021 arXiv   pre-print
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, that optimizes a soft-robust objective that balances expected performance and risk.  ...  While most policy optimization approaches handle this uncertainty by optimizing for expected performance, many applications demand risk-averse behavior.  ...  This work has taken place in the AUTOLAB and InterACT Lab at the University of California, Berkeley and the Reinforcement Learning and Robustness Lab (RLsquared) at the University of New Hampshire.  ... 
arXiv:2106.06499v2 fatcat:qpyrd4lrr5aqjl4kqeaf3z3fgi

Bayesian Robust Optimization for Imitation Learning [article]

Daniel S. Brown, Scott Niekum, Marek Petrik
2020 arXiv   pre-print
To provide a bridge between these two extremes, we propose Bayesian Robust Optimization for Imitation Learning (BROIL).  ...  IRL approaches either optimize a policy for the mean or MAP reward function.  ...  Acknowledgments and Disclosure of Funding We would like to thank the reviewers for their detailed feedback that helped to improve the paper.  ... 
arXiv:2007.12315v3 fatcat:a3yro3z2ffhxzd7rsk5nzrfhla

A Bayesian Approach to Generative Adversarial Imitation Learning

Wonseok Jeon, Seokin Seo, Kee-Eung Kim
2018 Neural Information Processing Systems  
To address this issue, we first propose a Bayesian formulation of generative adversarial imitation learning (GAIL), where the imitation policy and the cost function are represented as stochastic neural  ...  Generative adversarial training for imitation learning has shown promising results on high-dimensional and continuous control tasks.  ...  Imitation learning with policy gradients is a recently proposed approach that uses gradient-based stochastic optimizers.  ... 
dblp:conf/nips/JeonSK18 fatcat:obctzjpj2ndrhbcv3xgmix4seq

Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction

Yichuan Zhang, Yixing Lan, Qiang Fang, Xin Xu, Junxiang Li, Yujun Zeng, Ahmed Mostafa Khalil
2021 Computational Intelligence and Neuroscience  
Once the coarse policy's confidence is low, another RL-based refine module will further optimize and fine-tune the policy to form a (near) optimal hybrid policy.  ...  robustness performance.  ...  For RLBNK-switch, the RL algorithm is employed to learn a policy for the state space S − S K . erefore, the gradient estimator also turns from equation (4) into ∇J(ϕ) � E s∼ S−S K ( ),a ∼ A ∇ ϕ log π ϕ  ... 
doi:10.1155/2021/7588221 pmid:34603434 pmcid:PMC8486502 fatcat:hvjdbrpjxrbznntrrur4omz32m

NEARL: Non-Explicit Action Reinforcement Learning for Robotic Control [article]

Nan Lin, Yuxuan Li, Yujun Zhu, Ruolin Wang, Xiayu Zhang, Jianmin Ji, Keke Tang, Xiaoping Chen, Xinming Zhang
2020 arXiv   pre-print
Under our framework, widely available state-only demonstrations can be exploited effectively for imitation learning. Also, prior knowledge and constraints can be applied to meta policy.  ...  We test our algorithm in simulation tasks and its combination with imitation learning. The experimental results show the reliability and robustness of our algorithms.  ...  like traditional RL, which could be achieved via the aforementioned policy gradient (Eq. 1) to optimize the combined policy.  ... 
arXiv:2011.01046v1 fatcat:bicjlzj2wjffnji5wde5ttvymy

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences [article]

Daniel S. Brown, Russell Coleman, Ravi Srinivasan, Scott Niekum
2020 arXiv   pre-print
However, Bayesian reward learning methods are typically computationally intractable for complex control problems.  ...  Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing imitation learning.  ...  High Confidence Policy Evaluation for Imitation Learning Before detailing our approach, we first formalize the problem of high-confidence policy evaluation for imitation learn-ing.  ... 
arXiv:2002.09089v4 fatcat:vk6ebzm2ijesjdp3bahj5cjdgi

Bayesian Model-Agnostic Meta-Learning [article]

Taesup Kim, Jaesik Yoon, Ousmane Dia, Sungwoong Kim, Yoshua Bengio and Sungjin Ahn
2018 arXiv   pre-print
Learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning due to the model uncertainty inherent in the problem.  ...  In addition, a robust Bayesian meta-update mechanism with a new meta-loss prevents overfitting during meta-update.  ...  Acknowledgments JY thanks SAP and Kakao Brain for their support. TK thanks NSERC, MILA and Kakao Brain for their support.  ... 
arXiv:1806.03836v4 fatcat:qe2scrfmzzahfbgfpn5rtwugzq

Robot learning from demonstration

2004 Robotics and Autonomous Systems  
and the optimal imitation control policy.  ...  Nakanishi et al. tackle imitation learning of locomotion patterns for a bipedal robot using Locally Weighted Regression as learning scheme, a gradient-descent based method.  ... 
doi:10.1016/s0921-8890(04)00037-5 fatcat:ydogncl2p5hltorg2vcm5lucd4

Safe adaptation in multiagent competition [article]

Macheng Shen, Jonathan P. How
2022 arXiv   pre-print
of the ego-agent's policy.  ...  In multiagent competitive scenarios, agents may have to adapt to new opponents with previously unseen behaviors by learning from the interaction experiences between the ego-agent and the opponent.  ...  To resolve this ambiguity and achieving optimal discriminative power, we choose to learn the discrepancy metric via adversarial learning following the paradigm of generative adversarial imitation learning  ... 
arXiv:2203.07562v1 fatcat:jc23hg567jerboxajlqcaz74pu

A Comprehensive Overview and Survey of Recent Advances in Meta-Learning [article]

Huimin Peng
2020 arXiv   pre-print
We briefly introduce meta-learning methodologies in the following categories: black-box meta-learning, metric-based meta-learning, layered meta-learning and Bayesian meta-learning framework.  ...  Deep learning is focused upon in-sample prediction and meta-learning concerns model adaptation for out-of-sample prediction.  ...  Acknowledgment Thanks to Debasmit Das, Louis Kirsch and Luca Bertinetto (in alphabetical order) for useful and valuable comments on this manuscript.  ... 
arXiv:2004.11149v7 fatcat:ko266mr26jar3pyn6t4r3l5drm

Learning Control in Robotics

Stefan Schaal, Christopher Atkeson
2010 IEEE robotics & automation magazine  
Keywords Robot learning, learning control, reinforcement learning, optimal control.  ...  research was supported in part by National Science Foundation grants ECS-0326095, EEC-0540865, and ECCS-0824077, IIS-0535282, CNS-0619937, IIS-0917318, CBET-0922784, EECS-0926052, the DARPA program on Learning  ...  Thus, local optimization such as trajectory optimization seems to be more practical, using initialization of the policy from some informed guess, for instance, imitation learning [44] , [48] - [51]  ... 
doi:10.1109/mra.2010.936957 fatcat:sg4pl7qbrnfyvc7stg5wslkyry

Towards Mixed Optimization for Reinforcement Learning with Program Synthesis [article]

Surya Bhupatiraju, Kumar Krishna Agrawal, Rishabh Singh
2018 arXiv   pre-print
We present an iterative framework, MORL, for improving the learned policies using program synthesis.  ...  We instantiate MORL for the simple CartPole problem and show that the programmatic representation allows for high-level modifications that in turn lead to improved learning of the policies.  ...  Mixed Optimization for Reinforcement Learning Our goal is to improve policy learning by decomposing the usual gradient-based optimization scheme into an iterative two-stage algorithm.  ... 
arXiv:1807.00403v2 fatcat:yen3rmixgzfinmtmcgvhchb52m

Coordinated Multi-Agent Imitation Learning [article]

Hoang M. Le, Yisong Yue, Peter Carr, Patrick Lucey
2018 arXiv   pre-print
We illustrate the power of our approach on a difficult problem of learning multiple policies for fine-grained behavior modeling in team sports, where different players occupy different roles in the coordinated  ...  In particular, our method integrates unsupervised structure learning with conventional imitation learning.  ...  The main idea is to integrate imitation learning with unsupervised structure learning by taking turns to (i) optimize for imitation policies while fixing a structured model (minimizing imitation loss),  ... 
arXiv:1703.03121v2 fatcat:ie24xw27c5ghrjc2hhka4c5are

Bottom-Up Meta-Policy Search [article]

Luckeciano C. Melo, Marcos R. O. A. Maximo, Adilson Marques da Cunha
2019 arXiv   pre-print
To mitigate these problems, we propose and apply a first-order Meta-Learning algorithm called Bottom-Up Meta-Policy Search (BUMPS), which works with two-phase optimization procedure: firstly, in a meta-training  ...  , which evaluates few policies sampled from the meta-policy distribution and selects which best solves the task.  ...  We also would like to acknowledge Deep Learning Brazil research group for all financial support and insightful discussions throughout this work.  ... 
arXiv:1910.10232v2 fatcat:nodltaa7zfhtvfbqst3wyxmi7a

Robot learning from demonstration

Aude Billard, Roland Siegwart
2004 Robotics and Autonomous Systems  
and the optimal imitation control policy.  ...  Nakanishi et al. tackle imitation learning of locomotion patterns for a bipedal robot using Locally Weighted Regression as learning scheme, a gradient-descent based method.  ... 
doi:10.1016/j.robot.2004.03.001 fatcat:lkx5gvv5mzcsbgm7sp2bs6g64y
« Previous Showing results 1 — 15 out of 1,842 results