Filters








741 Hits in 6.3 sec

Decoupled Exploration and Exploitation Policies for Sample-Efficient Reinforcement Learning [article]

William F. Whitney, Michael Bloesch, Jost Tobias Springenberg, Abbas Abdolmaleki, Kyunghyun Cho, Martin Riedmiller
2021 arXiv   pre-print
Despite the close connection between exploration and sample efficiency, most state of the art reinforcement learning algorithms include no considerations for exploration beyond maximizing the entropy of  ...  This causes BBE to be actively detrimental to policy learning in many control tasks.  ...  Comparing results from U and UF, we see that fast adaptation leads to faster state coverage ( Figure 2) and thus faster task learning (Figure 3 ).  ... 
arXiv:2101.09458v2 fatcat:6tv26wgtnbgwvcpf7y5uaotu3u

Relational Neurogenesis for Lifelong Learning Agents

Tej Pandit, Dhireesha Kudithipudi
2020 Proceedings of the Neuro-inspired Computational Elements Workshop  
The ability to learn through continuous reinforcement and interaction with an environment negates the requirement of painstakingly curated datasets and hand crafted features.  ...  The search for lifelong learning algorithms creates the foundation for this work.  ...  RN is seen to have a sudden accuracy bump near the 1200 epoch.  ... 
doi:10.1145/3381755.3381766 dblp:conf/nice/PanditK20 fatcat:flknsyjdprbdxg76upjzpdacju

Task-Guided Inverse Reinforcement Learning Under Partial Information [article]

Franck Djeumou, Murat Cubuktepe, Craig Lennon, Ufuk Topcu
2021 arXiv   pre-print
We study the problem of inverse reinforcement learning (IRL), where the learning agent recovers a reward function using expert demonstrations.  ...  We demonstrate that, even with severely limited data, the algorithm learns reward functions and policies that satisfy the task and induce a similar behavior to the expert by leveraging the side information  ...  Figure 9 shows that the algorithm is unable to learn without side information while side information induces a learned policy that is optimal.  ... 
arXiv:2105.14073v2 fatcat:5kluysmvurcdvmxqupbscbifle

Transfer Learning in Attack Avoidance Games

Edwin Torres, Fernando Lozano
2020 Journal of Computer Science  
Transfer knowledge is a human characteristic that has been replicated in machine learning algorithms to improve learning performance measures.  ...  However, little success has been accomplished in reinforcement learning tasks when a function approximation is needed to estimate the value functions.  ...  The corresponding author confirms that all of the other authors have read and approved the manuscript and no ethical issues involved.  ... 
doi:10.3844/jcssp.2020.1465.1476 fatcat:iqea3cqpvbdubgk7mmrxybnlcu

Model Repair with Quality-Based Reinforcement Learning

Iovino Ludovico, Angela Barriga, Adrian Rutle, Rogardt Heldal
2020 Journal of Object Technology  
The framework uses reinforcement learning to find the best sequence of actions for repairing a broken model. 1 This domain model has been taken from a dataset of academic examples used during the MDE Course  ...  Similar to any other software artifact, domain models are subject to the introduction of errors during the modeling process.  ...  models using reinforcement learning (RL) [TL00] .  ... 
doi:10.5381/jot.2020.19.2.a17 fatcat:tato65snznaqfhyoebreokpgaq

High-Accuracy Model-Based Reinforcement Learning, a Survey [article]

Aske Plaat and Walter Kosters and Mike Preuss
2021 arXiv   pre-print
To reduce the number of environment samples, model-based reinforcement learning creates an explicit model of the environment dynamics.  ...  Deep reinforcement learning has shown remarkable success in the past few years.  ...  Acknowledgments We thank the members of the Leiden Reinforcement Learning Group, and especially Thomas Moerland and Mike Huisman, for many discussions and insights.  ... 
arXiv:2107.08241v1 fatcat:tma6xb2uy5fybjfhmzasfx2cta

Interterminal Truck Routing Optimization Using Deep Reinforcement Learning

Taufik Nur Adi, Yelita Anggiane Iskandar, Hyerim Bae
2020 Sensors  
The study of deep reinforcement learning in truck routing optimization is still limited.  ...  The experiment results showed that the proposed method obtains considerably better results compared to the other algorithms.  ...  [24] adopted and improved the native Q-learning, one of the reinforcement learning (RL) algorithms, to optimize the route of on-demand bus systems.  ... 
doi:10.3390/s20205794 pmid:33066280 pmcid:PMC7602099 fatcat:d6lp2lbkqvgtvoukxb4tcjuzke

A Provably Efficient Sample Collection Strategy for Reinforcement Learning [article]

Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric
2021 arXiv   pre-print
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior.  ...  In this paper, we propose to tackle the exploration-exploitation problem following a decoupled approach composed of: 1) An "objective-specific" algorithm that (adaptively) prescribes how many samples to  ...  Acknowledgments The authors thank Evrard Garcelon and Omar Darwiche Domingues for useful discussion.  ... 
arXiv:2007.06437v2 fatcat:wfkmhkxa5fcsxbizjkorrkgfru

A modeling framework for adaptive lifelong learning with transfer and savings through gating in the prefrontal cortex [article]

Ben Tsuda, Kay M Tye, Hava T Siegelmann, Terrence J Sejnowski
2020 bioRxiv   pre-print
Recent research on artificial neural networks trained by reinforcement learning has made it possible to model fundamental processes underlying schema encoding and storage.  ...  We show how incorporation of gating naturally leads to transfer learning and robust memory savings.  ...  Acknowledgements We thank Robert Kim, Yusi Chen, and Gal Mishne for helpful discussions and feedback on the manuscript. We thank Jorge Aldana for support with computing resources.  ... 
doi:10.1101/2020.03.11.984757 fatcat:qqhmv3jstra4njfevapojrwcwy

A reawakening of Machine Learning Application in Unmanned Aerial Vehicle: Future Research Motivation

Wasswa Shafik, S. Mojtaba Matinkhah, Fawad Shokoor, Lule Sharif
2022 EAI Endorsed Transactions on Internet of Things  
Supervised, unsupervised, semi-supervised, and Reinforcement Learning (RL) are the main types of ML.  ...  This is a comparison to supervised and non-supervised learning due to the interactive nature of the environment.  ...  The authors in nutshell would like to distinguish the support and comments shared with us from the computer engineering department members to attain this paper's quality.  ... 
doi:10.4108/eetiot.v8i29.987 fatcat:hkyfcvmj5bdt7gmiigvzlvgpvi

A Data-Efficient Deep Learning Approach for Deployable Multimodal Social Robots

Heriberto Cuayáhuitl
2019 Neurocomputing  
The deep supervised and reinforcement learning paradigms (among others) have the potential to endow interactive multimodal social robots with the ability of acquiring skills autonomously.  ...  As a step in this direction, we propose a deep learning-based approach for efficiently training a humanoid robot to play multimodal games---and use the game of 'Noughts & Crosses' with two variants as  ...  Acknowledgement The robot used in this paper was donated by the Engineering & Physical Sciences Research Council (EPSRC), U.K.  ... 
doi:10.1016/j.neucom.2018.09.104 fatcat:eojjpwjq5bfzja7ynuu53matty

Improving Primary Frequency Response in Networked Microgrid Operations using MLP-Driven Reinforcement Learning

Nikitha Radhakrishnan, Indrasis Chakraborty, Jing Xie, Priya Thekkumparambath Mana, Francis Tuffner, Bishnu Bhattarai, Kevin Schneider
2020 IET Smart Grid  
This study investigates the use of a reinforcement-learning-based controller trained over several switching transient scenarios to modify generator controls during large frequency deviations.  ...  Compared to previously used proportional-integral controllers, the proposed controller can improve primary frequency response while adapting to changes in system topologies and conditions.  ...  The authors also intend to test the proposed CVR controller in the field and analyse the system dynamics with varying levels of CVR controller penetration.  ... 
doi:10.1049/iet-stg.2019.0261 fatcat:oysw4sgv4rbuxkovfqpwalpyiq

A learning-based flexible autonomous motion control method for UAV in dynamic unknown environments

Wan Kaifang, Li Bo, Gao Xiaoguang, Hu Zijian, Yang Zhipeng
2021 Journal of Systems Engineering and Electronics  
This paper presents a deep reinforcement learning (DRL)-based motion control method to provide unmanned aerial vehicles (UAVs) with additional flexibility while flying across dynamic unknown environments  ...  The training experiment results show that the novel DRL algorithms provide more than a 20% performance improvement over the state-ofthe-art DRL algorithms.  ...  On combining the variable learning rate and the shaped reward function and adding it to the classic DQN, we obtain the DA-DRL-based motion-planning algorithm in Algorithm 1.  ... 
doi:10.23919/jsee.2021.000126 fatcat:prrdpqlfkjhvnli4litx4hqyte

Hotel2vec: Learning Attribute-Aware Hotel Embeddings with Self-Supervision [article]

Ali Sadeghian, Shervin Minaee, Ioannis Partalas, Xinxin Li, Daisy Zhe Wang, Brooke Cowan
2019 arXiv   pre-print
During model training, a joint embedding is learned from all of the above information.  ...  We show empirically that our model generates high-quality representations that boost the performance of a hotel recommendation system in addition to other applications.  ...  Acknowledgments The authors would like to thank Ion Lesan, Peter Barszczewski, Daniele Donghi, Ankur Aggrawal for helping us collecting hotel's attribute, click and geographical data.  ... 
arXiv:1910.03943v1 fatcat:jhr6nlg6zjdzlhlu4lcpk4zoi4

The construction and deconstruction of sub-optimal preferences through range-adapting reinforcement learning [article]

Sophie Bavard, Aldo Rustichini, Stefano Palminteri
2020 bioRxiv   pre-print
This is particularly striking in reinforcement learning (RL) situations when options are extrapolated from their original context.  ...  Range adaptation can be seen as the result of an adaptive coding process aiming at increasing the signal-to-noise ratio.  ...  Range adapting reinforcement learning is clearly adaptive in the learning phase. We could hypothesize that the situations in which the process is adaptive are more frequent in real life.  ... 
doi:10.1101/2020.07.28.224642 fatcat:f3lbuz5zfzcuxogbcy6ciyqjry
« Previous Showing results 1 — 15 out of 741 results