A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Reinforcement Learning via Recurrent Convolutional Neural Networks
[article]
2017
arXiv
pre-print
Deep Reinforcement Learning has enabled the learning of policies for complex tasks in partially observable environments, without explicitly learning the underlying model of the tasks. ...
We present a natural representation of to Reinforcement Learning (RL) problems using Recurrent Convolutional Neural Networks (RCNNs), to better exploit this inherent structure. ...
element-wise multiplication O(s z ) b(s ), can be considered as an element-wise product (or Hadamard product) layer in 2D. 3) Recurrence Stage: The network output, b(s ), is fed as the input b(s) at the ...
arXiv:1701.02392v1
fatcat:vcqsex7whnczpk2s3fks5aecdi
Imitation Learning via Differentiable Physics
[article]
2022
arXiv
pre-print
To simplify the complex optimization landscape induced by temporal physics operations, ILD dynamically selects the learning objectives for each state during optimization. ...
Existing imitation learning (IL) methods such as inverse reinforcement learning (IRL) usually have a double-loop training process, alternating between learning a reward function and a policy and tend to ...
prior for policy learning, ILD obtains a policy that generalizes better to complex dynamics. ...
arXiv:2206.04873v1
fatcat:br5o6gwjkjaxdpvxbsp7qtdfeq
Inferring learning rules from animal decision-making
2020
Neural Information Processing Systems
Whereas the average contribution of the conventional REINFORCE learning rule to the policy update for mice learning the International Brain Laboratory's task was just 30%, we found that adding baseline ...
How do animals learn? This remains an elusive question in neuroscience. ...
Finally, we thank the anonymous NeurIPS reviewers for their insightful comments and feedback. ...
dblp:conf/nips/AshwoodRBP20
fatcat:iobgajbk25fsvd4kyo7c4oldqy
Abstract Reasoning with Distracting Features
[article]
2019
arXiv
pre-print
for predictions. ...
We later show that carefully designed learning trajectory over different categories of training data can effectively boost learning performance by mitigating the impacts of distracting features. ...
For each multiple-choice candidate, our proposed LEN model calculates a score respectively, allowing the network to select the multiple-choice candidate with the highest score. ...
arXiv:1912.00569v1
fatcat:xpcruq56sbanha2k2z4o35fbpi
Online Constrained Model-based Reinforcement Learning
[article]
2020
arXiv
pre-print
The environment's dynamics are learned from limited training data and can be reused in new task instances without retraining. ...
Applying reinforcement learning to robotic systems poses a number of challenging problems. ...
Acknowledgements We thank the reviewer for their helpful insights and feedback. ...
arXiv:2004.03499v1
fatcat:f6jixaikjfddppi2h4miyjai6q
An Inverse Reinforcement Learning Algorithm for Partially Observable Domains with Application on Healthcare Dialogue Management
2012
2012 11th International Conference on Machine Learning and Applications
The problem is formulated as inverse reinforcement learning (IRL) in the POMDP framework. ...
In this paper, we propose an algorithm for learning a reward model from an expert policy in partially observable Markov decision processes (POMDPs). ...
Moreover, for the general query action the reward is considered as +0.4 in every state. For the choice of features, we automatically learned Keyword features. ...
doi:10.1109/icmla.2012.31
dblp:conf/icmla/ChinaeiC12
fatcat:t7vthkfe6jfyzjnbied4aylk4m
A Deep Reinforcement Learning Architecture for Multi-stage Optimal Control
[article]
2019
arXiv
pre-print
Here, we introduce stacked deep Q learning (SDQL), a flexible modularized deep reinforcement learning architecture, that can enable finding of optimal control policy of control tasks consisting of multiple ...
Deep reinforcement learning for high dimensional, hierarchical control tasks usually requires the use of complex neural networks as functional approximators, which can lead to inefficiency, instability ...
Using multiple Q networks in SDQL offers a number of advantages over using a complex neural network for end-to-end learning [5] . ...
arXiv:1911.10684v1
fatcat:zmmcpfx46jcqnd44xpopltoxde
Learning to Fly – a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control
[article]
2021
arXiv
pre-print
Vice versa, many reinforcement learning environments trade-off realism for high sample throughputs in toy-like problems. ...
In this paper, we propose an open-source OpenAI Gym-like environment for multiple quadcopters based on the Bullet physics engine. ...
ACKNOWLEDGMENTS We acknowledge the support of Mitacs's Elevate Fellowship program and General Dynamics Land Systems-Canada (GDLS-C)'s Innovation Cell. ...
arXiv:2103.02142v3
fatcat:vzgoqo2sxja7tgjv3oc4o4bqoy
Multi-View Reinforcement Learning
[article]
2019
arXiv
pre-print
This paper is concerned with multi-view reinforcement learning (MVRL), which allows for decision making when agents share common dynamics but adhere to different observation models. ...
Specifically, we show reductions in sample complexities and computational time for acquiring policies that handle multi-view environments. ...
In this paper, we contribute by introducing a framework for multi-view reinforcement learning that generalizes partially observable Markov decision processes (POMDPs) to ones that exhibit multiple observation ...
arXiv:1910.08285v1
fatcat:xng5iej2wza2jc7yzqh53opt3q
Learning socially normative robot navigation behaviors with Bayesian inverse reinforcement learning
2016
2016 IEEE International Conference on Robotics and Automation (ICRA)
We thus develop a flexible graphbased representation able to capture relevant task structure and extend Bayesian inverse reinforcement learning to use sampled trajectories from this representation. ...
In this paper, we address this task using a learning approach that enables a mobile robot to acquire navigation behaviors from demonstrations of socially normative human behavior. ...
[11] used dynamic potential fields and RRT to plan trajectories around multiple people but without considering social relations between them. Lu et al. ...
doi:10.1109/icra.2016.7487452
dblp:conf/icra/OkalA16
fatcat:yp6veapukfgd3bcrre5xsphre4
Model-Based Reinforcement Learning in Continuous Environments Using Real-Time Constrained Optimization
2015
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
In addition, hard constraints can easily be included and objectives can also be changed in real-time to allow for multiple or dynamic tasks. ...
In this paper we propose a model-based reinforcement learning approach for continuous environments with constraints. ...
National Graduate School in Computer Science, Sweden (CUGS), the Swedish Aeronautics Research Council (NFFP6), the Swedish Foundation for Strategic Research (SSF) project CUAS and the Center for Industrial ...
doi:10.1609/aaai.v29i1.9623
fatcat:mkm5ui5ymvd7hcwaq2iw52yrsa
From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following
[article]
2019
arXiv
pre-print
Reinforcement learning is a promising framework for solving control problems, but its use in practical situations is hampered by the fact that reward functions are often difficult to engineer. ...
In this work, we investigate the problem of grounding language commands as reward functions using inverse reinforcement learning, and argue that language-conditioned rewards are more transferable than ...
We selected our architecture via a hyper-parameter search, and found that the choice of using an element-wise multiplication versus a concatenation for combining embeddings had no appreciable performance ...
arXiv:1902.07742v1
fatcat:6tjvjqd5vvaezertlmp2fh3oi4
Deep Inverse Reinforcement Learning for Route Choice Modeling
[article]
2022
arXiv
pre-print
To address these issues, this study proposes a general deep inverse reinforcement learning (IRL) framework for link-based route choice modeling, which is capable of incorporating high-dimensional features ...
While several recent studies have started to explore the applicability of deep learning for travel choice modeling, they are all path-based with relatively simple model architectures and cannot take advantage ...
In this study, we propose a deep inverse reinforcement learning (IRL) framework for linkbased route choice modeling. ...
arXiv:2206.10598v1
fatcat:r5mpyrbg2beenmvg6fbdrdvua4
Modular Networks Prevent Catastrophic Interference in Model-Based Multi-Task Reinforcement Learning
[article]
2021
arXiv
pre-print
In a multi-task reinforcement learning setting, the learner commonly benefits from training on multiple related tasks by exploiting similarities among them. ...
While this effect is well documented for model-free multi-task methods, we demonstrate a detrimental effect when using a single learned dynamics model for multiple tasks. ...
Tobias Glasmachers for their feedback and help, which greatly influenced this work. ...
arXiv:2111.08010v1
fatcat:jpf2naoimrhetlleji6tcdcvq4
TrajGAIL: Generating Urban Vehicle Trajectories using Generative Adversarial Imitation Learning
[article]
2021
arXiv
pre-print
This research proposesTrajGAIL, a generative adversarial imitation learning framework for the urban vehicle trajectory generation. ...
A generative model for urban vehicle trajectories can better generalize from training data by learning the underlying distribution of the training data and, thus, produce synthetic vehicle trajectories ...
Gangnam district has major road links in a grid structure as shown in Figure 6 , so there are multiple choices in routes with similar travel distance for a given OD within the district. ...
arXiv:2007.14189v4
fatcat:svsumpjklncxdh62g2zwyhmvnq
« Previous
Showing results 1 — 15 out of 16,183 results