16,183 Hits in 5.4 sec

Reinforcement Learning via Recurrent Convolutional Neural Networks [article]

Tanmay Shankar, Santosha K. Dwivedy, Prithwijit Guha
2017 arXiv   pre-print
Deep Reinforcement Learning has enabled the learning of policies for complex tasks in partially observable environments, without explicitly learning the underlying model of the tasks.  ...  We present a natural representation of to Reinforcement Learning (RL) problems using Recurrent Convolutional Neural Networks (RCNNs), to better exploit this inherent structure.  ...  element-wise multiplication O(s z ) b(s ), can be considered as an element-wise product (or Hadamard product) layer in 2D. 3) Recurrence Stage: The network output, b(s ), is fed as the input b(s) at the  ... 
arXiv:1701.02392v1 fatcat:vcqsex7whnczpk2s3fks5aecdi

Imitation Learning via Differentiable Physics [article]

Siwei Chen, Xiao Ma, Zhongwen Xu
2022 arXiv   pre-print
To simplify the complex optimization landscape induced by temporal physics operations, ILD dynamically selects the learning objectives for each state during optimization.  ...  Existing imitation learning (IL) methods such as inverse reinforcement learning (IRL) usually have a double-loop training process, alternating between learning a reward function and a policy and tend to  ...  prior for policy learning, ILD obtains a policy that generalizes better to complex dynamics.  ... 
arXiv:2206.04873v1 fatcat:br5o6gwjkjaxdpvxbsp7qtdfeq

Inferring learning rules from animal decision-making

Zoe Ashwood, Nicholas A. Roy, Ji Hyun Bak, Jonathan W. Pillow
2020 Neural Information Processing Systems  
Whereas the average contribution of the conventional REINFORCE learning rule to the policy update for mice learning the International Brain Laboratory's task was just 30%, we found that adding baseline  ...  How do animals learn? This remains an elusive question in neuroscience.  ...  Finally, we thank the anonymous NeurIPS reviewers for their insightful comments and feedback.  ... 
dblp:conf/nips/AshwoodRBP20 fatcat:iobgajbk25fsvd4kyo7c4oldqy

Abstract Reasoning with Distracting Features [article]

Kecheng Zheng, Zheng-jun Zha, Wei Wei
2019 arXiv   pre-print
for predictions.  ...  We later show that carefully designed learning trajectory over different categories of training data can effectively boost learning performance by mitigating the impacts of distracting features.  ...  For each multiple-choice candidate, our proposed LEN model calculates a score respectively, allowing the network to select the multiple-choice candidate with the highest score.  ... 
arXiv:1912.00569v1 fatcat:xpcruq56sbanha2k2z4o35fbpi

Online Constrained Model-based Reinforcement Learning [article]

Benjamin van Niekerk, Andreas Damianou, Benjamin Rosman
2020 arXiv   pre-print
The environment's dynamics are learned from limited training data and can be reused in new task instances without retraining.  ...  Applying reinforcement learning to robotic systems poses a number of challenging problems.  ...  Acknowledgements We thank the reviewer for their helpful insights and feedback.  ... 
arXiv:2004.03499v1 fatcat:f6jixaikjfddppi2h4miyjai6q

An Inverse Reinforcement Learning Algorithm for Partially Observable Domains with Application on Healthcare Dialogue Management

Hamid R. Chinaei, Brahim Chaib-Draa
2012 2012 11th International Conference on Machine Learning and Applications  
The problem is formulated as inverse reinforcement learning (IRL) in the POMDP framework.  ...  In this paper, we propose an algorithm for learning a reward model from an expert policy in partially observable Markov decision processes (POMDPs).  ...  Moreover, for the general query action the reward is considered as +0.4 in every state. For the choice of features, we automatically learned Keyword features.  ... 
doi:10.1109/icmla.2012.31 dblp:conf/icmla/ChinaeiC12 fatcat:t7vthkfe6jfyzjnbied4aylk4m

A Deep Reinforcement Learning Architecture for Multi-stage Optimal Control [article]

Yuguang Yang
2019 arXiv   pre-print
Here, we introduce stacked deep Q learning (SDQL), a flexible modularized deep reinforcement learning architecture, that can enable finding of optimal control policy of control tasks consisting of multiple  ...  Deep reinforcement learning for high dimensional, hierarchical control tasks usually requires the use of complex neural networks as functional approximators, which can lead to inefficiency, instability  ...  Using multiple Q networks in SDQL offers a number of advantages over using a complex neural network for end-to-end learning [5] .  ... 
arXiv:1911.10684v1 fatcat:zmmcpfx46jcqnd44xpopltoxde

Learning to Fly – a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control [article]

Jacopo Panerati
2021 arXiv   pre-print
Vice versa, many reinforcement learning environments trade-off realism for high sample throughputs in toy-like problems.  ...  In this paper, we propose an open-source OpenAI Gym-like environment for multiple quadcopters based on the Bullet physics engine.  ...  ACKNOWLEDGMENTS We acknowledge the support of Mitacs's Elevate Fellowship program and General Dynamics Land Systems-Canada (GDLS-C)'s Innovation Cell.  ... 
arXiv:2103.02142v3 fatcat:vzgoqo2sxja7tgjv3oc4o4bqoy

Multi-View Reinforcement Learning [article]

Minne Li, Lisheng Wu, Haitham Bou Ammar, Jun Wang
2019 arXiv   pre-print
This paper is concerned with multi-view reinforcement learning (MVRL), which allows for decision making when agents share common dynamics but adhere to different observation models.  ...  Specifically, we show reductions in sample complexities and computational time for acquiring policies that handle multi-view environments.  ...  In this paper, we contribute by introducing a framework for multi-view reinforcement learning that generalizes partially observable Markov decision processes (POMDPs) to ones that exhibit multiple observation  ... 
arXiv:1910.08285v1 fatcat:xng5iej2wza2jc7yzqh53opt3q

Learning socially normative robot navigation behaviors with Bayesian inverse reinforcement learning

Billy Okal, Kai O. Arras
2016 2016 IEEE International Conference on Robotics and Automation (ICRA)  
We thus develop a flexible graphbased representation able to capture relevant task structure and extend Bayesian inverse reinforcement learning to use sampled trajectories from this representation.  ...  In this paper, we address this task using a learning approach that enables a mobile robot to acquire navigation behaviors from demonstrations of socially normative human behavior.  ...  [11] used dynamic potential fields and RRT to plan trajectories around multiple people but without considering social relations between them. Lu et al.  ... 
doi:10.1109/icra.2016.7487452 dblp:conf/icra/OkalA16 fatcat:yp6veapukfgd3bcrre5xsphre4

Model-Based Reinforcement Learning in Continuous Environments Using Real-Time Constrained Optimization

Olov Andersson, Fredrik Heintz, Patrick Doherty
In addition, hard constraints can easily be included and objectives can also be changed in real-time to allow for multiple or dynamic tasks.  ...  In this paper we propose a model-based reinforcement learning approach for continuous environments with constraints.  ...  National Graduate School in Computer Science, Sweden (CUGS), the Swedish Aeronautics Research Council (NFFP6), the Swedish Foundation for Strategic Research (SSF) project CUAS and the Center for Industrial  ... 
doi:10.1609/aaai.v29i1.9623 fatcat:mkm5ui5ymvd7hcwaq2iw52yrsa

From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following [article]

Justin Fu, Anoop Korattikara, Sergey Levine, Sergio Guadarrama
2019 arXiv   pre-print
Reinforcement learning is a promising framework for solving control problems, but its use in practical situations is hampered by the fact that reward functions are often difficult to engineer.  ...  In this work, we investigate the problem of grounding language commands as reward functions using inverse reinforcement learning, and argue that language-conditioned rewards are more transferable than  ...  We selected our architecture via a hyper-parameter search, and found that the choice of using an element-wise multiplication versus a concatenation for combining embeddings had no appreciable performance  ... 
arXiv:1902.07742v1 fatcat:6tjvjqd5vvaezertlmp2fh3oi4

Deep Inverse Reinforcement Learning for Route Choice Modeling [article]

Zhan Zhao, Yuebing Liang
2022 arXiv   pre-print
To address these issues, this study proposes a general deep inverse reinforcement learning (IRL) framework for link-based route choice modeling, which is capable of incorporating high-dimensional features  ...  While several recent studies have started to explore the applicability of deep learning for travel choice modeling, they are all path-based with relatively simple model architectures and cannot take advantage  ...  In this study, we propose a deep inverse reinforcement learning (IRL) framework for linkbased route choice modeling.  ... 
arXiv:2206.10598v1 fatcat:r5mpyrbg2beenmvg6fbdrdvua4

Modular Networks Prevent Catastrophic Interference in Model-Based Multi-Task Reinforcement Learning [article]

Robin Schiewer, Laurenz Wiskott
2021 arXiv   pre-print
In a multi-task reinforcement learning setting, the learner commonly benefits from training on multiple related tasks by exploiting similarities among them.  ...  While this effect is well documented for model-free multi-task methods, we demonstrate a detrimental effect when using a single learned dynamics model for multiple tasks.  ...  Tobias Glasmachers for their feedback and help, which greatly influenced this work.  ... 
arXiv:2111.08010v1 fatcat:jpf2naoimrhetlleji6tcdcvq4

TrajGAIL: Generating Urban Vehicle Trajectories using Generative Adversarial Imitation Learning [article]

Seongjin Choi, Jiwon Kim, Hwasoo Yeo
2021 arXiv   pre-print
This research proposesTrajGAIL, a generative adversarial imitation learning framework for the urban vehicle trajectory generation.  ...  A generative model for urban vehicle trajectories can better generalize from training data by learning the underlying distribution of the training data and, thus, produce synthetic vehicle trajectories  ...  Gangnam district has major road links in a grid structure as shown in Figure 6 , so there are multiple choices in routes with similar travel distance for a given OD within the district.  ... 
arXiv:2007.14189v4 fatcat:svsumpjklncxdh62g2zwyhmvnq
« Previous Showing results 1 — 15 out of 16,183 results