130,117 Hits in 2.5 sec

Conservative Exploration in Reinforcement Learning [article]

Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta
2020 arXiv   pre-print
In this paper, we introduce the notion of conservative exploration for average reward and finite horizon problems.  ...  While learning in an unknown Markov Decision Process (MDP), an agent should trade off exploration to discover new information about the MDP, and exploitation of the current knowledge to maximize the reward  ...  Efficient bias-span-constrained exploration-exploitation in reinforcement learning. In ICML, Proceedings of Machine Learning Re- search. PMLR, 2018b. Aditya Gopalan and Shie Mannor.  ... 
arXiv:2002.03218v2 fatcat:armuovsmbvbrzmiwz3cdn3vh6y

Energy-Efficient Sensor Calibration Based on Deep Reinforcement Learning

Akm Ashiquzzaman, School of Electronics and Computer Engineering, Chonnam National University, Gwangju, South Korea
2019 International Journal of Artificial Intelligence and Applications for Smart Devices  
Reinforcement learning (RL) has been received much attention from researchers and now widely applied in many study fields to achieve intelligent automation.  ...  In this novel research, a new style of power conservation has been explored with the help of RL to make a new generation of IoT devices with calibrated power sources to maximize resource utilization.  ...  To our best knowledge, the novel research for utilizing IoT sensors and power conservation with reinforcement learning has not yet been thoroughly explored. In the research conducted by Dong et al.  ... 
doi:10.21742/ijaiasd.2019.7.1.02 fatcat:rspce7htu5hbliek3g7kvljmbu

Conservative Distributional Reinforcement Learning with Safety Constraints [article]

Hengrui Zhang, Youfang Lin, Sheng Han, Shuo Wang, Kai Lv
2022 arXiv   pre-print
In this paper, we present a novel off-policy reinforcement learning algorithm called Conservative Distributional Maximum a Posteriori Policy Optimization (CDMPO).  ...  Then, CDMPO uses a conservative value function loss to reduce the number of violations of constraints during the exploration process.  ...  Inaccurate estimation exists in almost all reinforcement learning approaches.  ... 
arXiv:2201.07286v1 fatcat:277n35nxonchbgvxk3swh5oowm

Page 1724 of Psychological Abstracts Vol. 43, Issue 12 [page]

1969 Psychological Abstracts  
—Investigated the presence of conservation of number in 117 children, 2 yr. 5 mo.-4 yr. 4 mo. of age. 2° of the Ss were found to be conservers.  ...  —Used a 2-phase experiment to (1) examine the relationship between explorative tendencies and stimulus saturation, and (2) compare play activity with learning activity.  ... 

Greedy UnMixing for Q-Learning in Multi-Agent Reinforcement Learning [article]

Chapman Siu, Jason Traish, Richard Yi Da Xu
2021 arXiv   pre-print
This paper introduces Greedy UnMix (GUM) for cooperative multi-agent reinforcement learning (MARL).  ...  It aims to address this through a conservative Q-learning approach through restricting the state-marginal in the dataset to avoid unobserved joint state action spaces, whilst concurrently attempting to  ...  Overcoming Bootstrap Accumulation Error in Reinforcement Learning: Overcoming bootstrap accumulation error is a key focus in offline learning (or batch reinforcement learning), whereby the temporal difference  ... 
arXiv:2109.09034v1 fatcat:tb73m22xl5byxntdzprulnamm4

Translating cognitive insights into effective conservation programs: Reply to Schakner et al

Alison L. Greggor, Nicola S. Clayton, Ben Phalan, Alex Thornton
2014 Trends in Ecology & Evolution  
s comments as part of this dialogue. 9 Their response mainly critiqued our decision to emphasize 'why' cognition is 10 important in animal conservation, asserting that we do not explore 'how' it should  ...  However, until a greater number of species-specific guidelines are 26 developed-such as the step-by-step reinforcement schedules that Schakner et al. mention-27 the fundamentals of perception and learning  ...  We Their response mainly critiqued our decision to emphasize 'why' cognition is 10 important in animal conservation, asserting that we do not explore 'how' it should be applied 11 in sufficient detail  ... 
doi:10.1016/j.tree.2014.09.009 pmid:25304444 fatcat:4aubdto6ondivco6ptnwgfprt4

Towards the Intelligent Home: Using Reinforcement-Learning for Optimal Heating Control [chapter]

Alexander Zenger, Jochen Schmidt, Michael Krödel
2013 Lecture Notes in Computer Science  
We propose a reinforcement learning approach to heating control in home automation, that can acquire a set of rules enabling an agent to heat a room to the desired temperature at a defined time while conserving  ...  [1] ), to our knowledge, this is the first time reinforcement learning is used in the context described in this paper.  ...  While reinforcement learning is relatively popular in control engineering for designing low-level control units (cf.  ... 
doi:10.1007/978-3-642-40942-4_30 fatcat:noco672zibadrcymbsbkgim33u

Energy-Efficient IoT Sensor Calibration with Deep Reinforcement Learning

Akm Ashiquzzaman, Hyunmin Lee, Tai-Won Um, Jinsul Kim
2020 IEEE Access  
In this novel research, a new style of power conservation has been explored with the help of RL to make a new generation of IoT devices with calibrated power sources to maximize resource utilization.  ...  Reinforcement learning (RL) has been received much attention from researchers and now widely applied in many study fields to achieve intelligent automation.  ...  LSTM in reinforcement learning has not been introduced or explored properly.  ... 
doi:10.1109/access.2020.2992853 fatcat:uw2hqu26t5fwrf3turn6uehtwe

Reinforcement Learning for Autonomous Driving with Latent State Inference and Spatial-Temporal Relationships [article]

Xiaobai Ma, Jiachen Li, Mykel J. Kochenderfer, David Isele, Kikuo Fujimura
2021 arXiv   pre-print
Deep reinforcement learning (DRL) provides a promising way for learning navigation in complex autonomous driving scenarios.  ...  In this work, we show that explicitly inferring the latent state and encoding spatial-temporal relationships in a reinforcement learning framework can help address this difficulty.  ...  In our work, both the latent inference and the vehicle control are learned under a reinforcement learning framework to handle complex observations and scenarios.  ... 
arXiv:2011.04251v2 fatcat:gidcfhxgcfcgfcftcxkpey2z2y

Improving Safety in Deep Reinforcement Learning using Unsupervised Action Planning [article]

Hao-Lun Hsu, Qiuhua Huang, Sehoon Ha
2021 arXiv   pre-print
In this work, we propose a novel technique of unsupervised action planning to improve the safety of on-policy reinforcement learning algorithms, such as trust region policy optimization (TRPO) or proximal  ...  One of the key challenges to deep reinforcement learning (deep RL) is to ensure safety at both training and testing phases.  ...  SAFE REINFORCEMENT LEARNING VIA UNSUPERVISED ACTION PLANNING In this section, we will present our safe reinforcement learning algorithm that achieves conservative exploration via unsupervised action planning  ... 
arXiv:2109.14325v1 fatcat:n5xtjr5hazaenamkvaz4tdf2ee

Multi-Preference Actor Critic [article]

Ishan Durugkar, Matthew Hausknecht, Adith Swaminathan, Patrick MacAlpine
2019 arXiv   pre-print
However, for most Reinforcement Learning tasks, humans can provide additional insight to constrain the policy learning.  ...  Experiments in Atari and Pendulum verify that constraints are being respected and can accelerate the learning process.  ...  This reward can then be used as any reward in reinforcement learning to learn a policy that mimics the expert.  ... 
arXiv:1904.03295v1 fatcat:wuyfroevgjgz7can73jb2vpxqq

Sensor Networks Routing via Bayesian Exploration

Shuang Hao, Ting Wang
2006 Local Computer Networks (LCN), Proceedings of the IEEE Conference on  
Since information concerning these constraints are unknown in an environment, a reinforcement learning approach is proposed to solve this problem.  ...  There is increasing research interest in solving routing problems in sensor networks subject to constraints such as data correlation, link reliability and energy conservation.  ...  LAN, sensor networks have various concerns which are unknown in advance. So a reinforcement learning is practical in the routing scenery.  ... 
doi:10.1109/lcn.2006.322207 dblp:conf/lcn/HaoW06 fatcat:ow2yj7ksv5eenpyoyn33y3vf74

Using tourism free‐choice learning experiences to promote environmentally sustainable behaviour: the role of post‐visit 'action resources'

Roy Ballantyne, Jan Packer
2011 Environmental Education Research  
Building on research and theory in relation to visitor experiences in free-choice learning environments, the paper identifies three different stages in the educational process and proposes a strategy for  ...  Previous research indicates that although visitors often leave such experiences with a heightened awareness of conservation issues and intentions to adopt environmentally responsible behaviors, only a  ...  it is reinforced by subsequent learning experiences.  ... 
doi:10.1080/13504622.2010.530645 fatcat:diijyb53vbd7ljt6u5ctmurmbe

Active Exploration by Chance-Constrained Optimization for Voltage Regulation with Reinforcement Learning

Zhenhuan Ding, Xiaoge Huang, Zhao Liu
2022 Energies  
This research proposes an active exploration (AE) method based on reinforcement learning (RL) to respond to the uncertainties by regulating the voltage of a distribution network with battery energy storage  ...  Meanwhile, the proposed method has advantages in BESS usage in conserveness compared to the chance-constrained optimization.  ...  Acknowledgments: A special thanks to Ziang Zhang for his invaluable contributions to the guidance in this work. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/en15020614 fatcat:b7345bpwmreelju7twyjvmgiaq

Selection in Scale-Free Small World [chapter]

Zsolt Palotai, Csilla Farkas, András Lőrincz
2005 Lecture Notes in Computer Science  
In this paper we compare our selection based learning algorithm with the reinforcement learning algorithm in Web crawlers. The task of the crawlers is to find new information on the Web.  ...  We have found that on this SFSW, the weblog update algorithm performs better than the reinforcement learning algorithm.  ...  We have found that the weblog update selection algorithm performs better in this environment than the reinforcement learning algorithm, eventhough the reinforcement learning algorithm has been shown to  ... 
doi:10.1007/11559221_65 fatcat:hlgckr6k4fhuvgkkns7ctvw2da
« Previous Showing results 1 — 15 out of 130,117 results