A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Conservative Exploration in Reinforcement Learning
[article]
2020
arXiv
pre-print
In this paper, we introduce the notion of conservative exploration for average reward and finite horizon problems. ...
While learning in an unknown Markov Decision Process (MDP), an agent should trade off exploration to discover new information about the MDP, and exploitation of the current knowledge to maximize the reward ...
Efficient bias-span-constrained
exploration-exploitation in reinforcement learning.
In ICML, Proceedings of Machine Learning Re-
search. PMLR, 2018b.
Aditya Gopalan and Shie Mannor. ...
arXiv:2002.03218v2
fatcat:armuovsmbvbrzmiwz3cdn3vh6y
Energy-Efficient Sensor Calibration Based on Deep Reinforcement Learning
2019
International Journal of Artificial Intelligence and Applications for Smart Devices
Reinforcement learning (RL) has been received much attention from researchers and now widely applied in many study fields to achieve intelligent automation. ...
In this novel research, a new style of power conservation has been explored with the help of RL to make a new generation of IoT devices with calibrated power sources to maximize resource utilization. ...
To our best knowledge, the novel research for utilizing IoT sensors and power conservation with reinforcement learning has not yet been thoroughly explored. In the research conducted by Dong et al. ...
doi:10.21742/ijaiasd.2019.7.1.02
fatcat:rspce7htu5hbliek3g7kvljmbu
Conservative Distributional Reinforcement Learning with Safety Constraints
[article]
2022
arXiv
pre-print
In this paper, we present a novel off-policy reinforcement learning algorithm called Conservative Distributional Maximum a Posteriori Policy Optimization (CDMPO). ...
Then, CDMPO uses a conservative value function loss to reduce the number of violations of constraints during the exploration process. ...
Inaccurate estimation exists in almost all reinforcement learning approaches. ...
arXiv:2201.07286v1
fatcat:277n35nxonchbgvxk3swh5oowm
Page 1724 of Psychological Abstracts Vol. 43, Issue 12
[page]
1969
Psychological Abstracts
—Investigated the presence of conservation of number in 117 children, 2 yr. 5 mo.-4 yr. 4 mo. of age. 2° of the Ss were found to be conservers. ...
—Used a 2-phase experiment to (1) examine the relationship between explorative tendencies and stimulus saturation, and (2) compare play activity with learning activity. ...
Greedy UnMixing for Q-Learning in Multi-Agent Reinforcement Learning
[article]
2021
arXiv
pre-print
This paper introduces Greedy UnMix (GUM) for cooperative multi-agent reinforcement learning (MARL). ...
It aims to address this through a conservative Q-learning approach through restricting the state-marginal in the dataset to avoid unobserved joint state action spaces, whilst concurrently attempting to ...
Overcoming Bootstrap Accumulation Error in Reinforcement Learning: Overcoming bootstrap accumulation error is a key focus in offline learning (or batch reinforcement learning), whereby the temporal difference ...
arXiv:2109.09034v1
fatcat:tb73m22xl5byxntdzprulnamm4
Translating cognitive insights into effective conservation programs: Reply to Schakner et al
2014
Trends in Ecology & Evolution
s comments as part of this dialogue. 9 Their response mainly critiqued our decision to emphasize 'why' cognition is 10 important in animal conservation, asserting that we do not explore 'how' it should ...
However, until a greater number of species-specific guidelines are 26 developed-such as the step-by-step reinforcement schedules that Schakner et al. mention-27 the fundamentals of perception and learning ...
We Their response mainly critiqued our decision to emphasize 'why' cognition is 10 important in animal conservation, asserting that we do not explore 'how' it should be applied 11 in sufficient detail ...
doi:10.1016/j.tree.2014.09.009
pmid:25304444
fatcat:4aubdto6ondivco6ptnwgfprt4
Towards the Intelligent Home: Using Reinforcement-Learning for Optimal Heating Control
[chapter]
2013
Lecture Notes in Computer Science
We propose a reinforcement learning approach to heating control in home automation, that can acquire a set of rules enabling an agent to heat a room to the desired temperature at a defined time while conserving ...
[1] ), to our knowledge, this is the first time reinforcement learning is used in the context described in this paper. ...
While reinforcement learning is relatively popular in control engineering for designing low-level control units (cf. ...
doi:10.1007/978-3-642-40942-4_30
fatcat:noco672zibadrcymbsbkgim33u
Energy-Efficient IoT Sensor Calibration with Deep Reinforcement Learning
2020
IEEE Access
In this novel research, a new style of power conservation has been explored with the help of RL to make a new generation of IoT devices with calibrated power sources to maximize resource utilization. ...
Reinforcement learning (RL) has been received much attention from researchers and now widely applied in many study fields to achieve intelligent automation. ...
LSTM in reinforcement learning has not been introduced or explored properly. ...
doi:10.1109/access.2020.2992853
fatcat:uw2hqu26t5fwrf3turn6uehtwe
Reinforcement Learning for Autonomous Driving with Latent State Inference and Spatial-Temporal Relationships
[article]
2021
arXiv
pre-print
Deep reinforcement learning (DRL) provides a promising way for learning navigation in complex autonomous driving scenarios. ...
In this work, we show that explicitly inferring the latent state and encoding spatial-temporal relationships in a reinforcement learning framework can help address this difficulty. ...
In our work, both the latent inference and the vehicle control are learned under a reinforcement learning framework to handle complex observations and scenarios. ...
arXiv:2011.04251v2
fatcat:gidcfhxgcfcgfcftcxkpey2z2y
Improving Safety in Deep Reinforcement Learning using Unsupervised Action Planning
[article]
2021
arXiv
pre-print
In this work, we propose a novel technique of unsupervised action planning to improve the safety of on-policy reinforcement learning algorithms, such as trust region policy optimization (TRPO) or proximal ...
One of the key challenges to deep reinforcement learning (deep RL) is to ensure safety at both training and testing phases. ...
SAFE REINFORCEMENT LEARNING VIA UNSUPERVISED ACTION PLANNING In this section, we will present our safe reinforcement learning algorithm that achieves conservative exploration via unsupervised action planning ...
arXiv:2109.14325v1
fatcat:n5xtjr5hazaenamkvaz4tdf2ee
Multi-Preference Actor Critic
[article]
2019
arXiv
pre-print
However, for most Reinforcement Learning tasks, humans can provide additional insight to constrain the policy learning. ...
Experiments in Atari and Pendulum verify that constraints are being respected and can accelerate the learning process. ...
This reward can then be used as any reward in reinforcement learning to learn a policy that mimics the expert. ...
arXiv:1904.03295v1
fatcat:wuyfroevgjgz7can73jb2vpxqq
Sensor Networks Routing via Bayesian Exploration
2006
Local Computer Networks (LCN), Proceedings of the IEEE Conference on
Since information concerning these constraints are unknown in an environment, a reinforcement learning approach is proposed to solve this problem. ...
There is increasing research interest in solving routing problems in sensor networks subject to constraints such as data correlation, link reliability and energy conservation. ...
LAN, sensor networks have various concerns which are unknown in advance. So a reinforcement learning is practical in the routing scenery. ...
doi:10.1109/lcn.2006.322207
dblp:conf/lcn/HaoW06
fatcat:ow2yj7ksv5eenpyoyn33y3vf74
Using tourism free‐choice learning experiences to promote environmentally sustainable behaviour: the role of post‐visit 'action resources'
2011
Environmental Education Research
Building on research and theory in relation to visitor experiences in free-choice learning environments, the paper identifies three different stages in the educational process and proposes a strategy for ...
Previous research indicates that although visitors often leave such experiences with a heightened awareness of conservation issues and intentions to adopt environmentally responsible behaviors, only a ...
it is reinforced by subsequent learning experiences. ...
doi:10.1080/13504622.2010.530645
fatcat:diijyb53vbd7ljt6u5ctmurmbe
Active Exploration by Chance-Constrained Optimization for Voltage Regulation with Reinforcement Learning
2022
Energies
This research proposes an active exploration (AE) method based on reinforcement learning (RL) to respond to the uncertainties by regulating the voltage of a distribution network with battery energy storage ...
Meanwhile, the proposed method has advantages in BESS usage in conserveness compared to the chance-constrained optimization. ...
Acknowledgments: A special thanks to Ziang Zhang for his invaluable contributions to the guidance in this work.
Conflicts of Interest: The authors declare no conflict of interest. ...
doi:10.3390/en15020614
fatcat:b7345bpwmreelju7twyjvmgiaq
Selection in Scale-Free Small World
[chapter]
2005
Lecture Notes in Computer Science
In this paper we compare our selection based learning algorithm with the reinforcement learning algorithm in Web crawlers. The task of the crawlers is to find new information on the Web. ...
We have found that on this SFSW, the weblog update algorithm performs better than the reinforcement learning algorithm. ...
We have found that the weblog update selection algorithm performs better in this environment than the reinforcement learning algorithm, eventhough the reinforcement learning algorithm has been shown to ...
doi:10.1007/11559221_65
fatcat:hlgckr6k4fhuvgkkns7ctvw2da
« Previous
Showing results 1 — 15 out of 130,117 results