6 Hits in 3.6 sec

An empirical investigation of the challenges of real-world reinforcement learning [article]

Gabriel Dulac-Arnold and Nir Levine and Daniel J. Mankowitz and Jerry Li and Cosmin Paduraru and Sven Gowal and Todd Hester
2021 arXiv   pre-print
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios.  ...  In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems.  ...  D4PG is a modified version of Deep Deterministic Policy Gradients (DDPG) , an actor-critic algorithm where state-action values are estimated by a critic network, and the actor network is updated with gradients  ... 
arXiv:2003.11881v2 fatcat:4drarwyswfbkndzluhju7ilxrm

Towards Automatic Actor-Critic Solutions to Continuous Control [article]

Jake Grigsby, Jin Yong Yoo, Yanjun Qi
2021 arXiv   pre-print
We then apply it to less common control tasks outside of simulated robotics to find high-performance solutions with minimal compute and research effort.  ...  However, these algorithms rely on a number of design tricks and hyperparameters, making their application to new domains difficult and computationally expensive.  ...  Meta-SAC [50] tunes the target entropy value with a meta-gradient approach. 3 Method: Automatic Actor Critics The issues above require a number of heuristic solutions that would be expensive to re-tune  ... 
arXiv:2106.08918v2 fatcat:2hy6rrfmoffx3be5xsdgr3krjq

Deep Reinforcement Learning, a textbook [article]

Aske Plaat
2022 arXiv   pre-print
Developments go quickly, and we also cover advanced topics: deep multi-agent reinforcement learning, deep hierarchical reinforcement learning, and deep meta learning.  ...  They have learned to fly model helicopters and perform aerobatic manoeuvers such as loops and rolls.  ...  MAML uses TRPO to estimate the gradient both for the policy gradient update(s) and the meta optimization [681] .  ... 
arXiv:2201.02135v2 fatcat:3icsopexerfzxa3eblpu5oal64

Artificial Intelligence for Prosthetics - challenge solutions [article]

Łukasz Kidziński, Carmichael Ong, Sharada Prasanna Mohanty, Jennifer Hicks, Sean F. Carroll, Bo Zhou, Hongsheng Zeng, Fan Wang, Rongzhong Lian, Hao Tian, Wojciech Jaśkowski, Garrett Andersen, Odd Rune Lykkebø, Nihat Engin Toklu (+31 others)
2019 arXiv   pre-print
Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending.  ...  In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity  ...  (a) Comparison between TD3 and DDPG with categorical value distribution approximation (also known as D4PG [4] ). Submit trick results.  ... 
arXiv:1902.02441v1 fatcat:hf7xzitrhjdqfb5cfaneovlfa4

Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels [article]

Ilya Kostrikov, Denis Yarats, Rob Fergus
2021 arXiv   pre-print
SLAC) methods and recently proposed contrastive learning (CURL).  ...  Our approach can be combined with any model-free reinforcement learning algorithm, requiring only minor modifications.  ...  Furthermore, we would like to thank Roberta Raileanu for helping with the architecture experiments.  ... 
arXiv:2004.13649v4 fatcat:6dl4xjzzfzebbctpsdq2ktbl2a

How to Certify Machine Learning Based Safety-critical Systems? A Systematic Literature Review [article]

Florian Tambon, Gabriel Laberge, Le An, Amin Nikanjam, Paulina Stevia Nouwou Mindom, Yann Pequignot, Foutse Khomh, Giulio Antoniol, Ettore Merlo, François Laviolette
2021 arXiv   pre-print
We analyzed the main trends and problems of each sub-field and provided summaries of the papers extracted.  ...  It also emphasized the need to further develop connections between academia and industries to deepen the domain study.  ...  and rewarded for driving forward without crashing, and 2) Humanoid domain: the agent (off-policy distributed distributional deterministic policy gradient-D4PG) runs a 21-DoF humanoid body in the MuJoCo  ... 
arXiv:2107.12045v3 fatcat:43vqxywawbeflhs6ehzovvsevm