87,349 Hits in 3.9 sec

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

Eyal Even-Dar, Shie Mannor, Yishay Mansour
2006 Journal of machine learning research  
This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise action elimination procedures in reinforcement learning algorithms.  ...  We further derive stopping conditions guaranteeing that the learned policy is approximately optimal with high probability.  ...  We use these functions to derive an asynchronous algorithm, which eliminates actions and supplies stopping condition.  ... 
dblp:journals/jmlr/Even-DarMM06 fatcat:sqxognjgarb6ze2ihm42vkpw3i

Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes [article]

Kunal Menda, Yi-Chun Chen, Justin Grana, James W. Bono, Brendan D. Tracey, Mykel J. Kochenderfer, David Wolpert
2017 arXiv   pre-print
Since macro-actions last for stochastic durations, multiple agents executing decentralized policies in cooperative environments must act asynchronously.  ...  Our algorithm works by framing problems as "event-driven decision processes," which are scenarios where the sequence and timing of actions and events are random and governed by an underlying stochastic  ...  The authors would also like to thank the anonymous reviewers for their helpful comments.  ... 
arXiv:1709.06656v1 fatcat:gfd7c37xnrhidjnlno44q7rtta

Deep Q-network-based traffic signal control models

Sangmin Park, Eum Han, Sungho Park, Harim Jeong, Ilsoo Yun
2021 PLoS ONE  
This study developed two traffic signal control models using reinforcement learning and a microscopic simulation-based evaluation for an isolated intersection and two coordinated intersections.  ...  To develop these models, a deep Q-network (DQN) was used, which is a promising reinforcement learning algorithm.  ...  Acknowledgments This study was prepared based on a doctoral dissertation at Ajou University Graduate School of Construction and Transportation Engineering in 2020.  ... 
doi:10.1371/journal.pone.0256405 pmid:34473716 pmcid:PMC8412290 fatcat:dcg4bwvni5hylga7yhrksycz7q

Reactive Reinforcement Learning in Asynchronous Environments

Jaden B. Travnik, Kory W. Mathewson, Richard S. Sutton, Patrick M. Pilarski
2018 Frontiers in Robotics and AI  
The relationship between a reinforcement learning (RL) agent and an asynchronous environment is often ignored.  ...  We compare a reactive SARSA learning algorithm with the conventional SARSA learning algorithm on two asynchronous robotic tasks (emergency stopping and impact prevention), and show that the reactive RL  ...  ACKNOWLEDGMENTS The authors thank the other members of the Bionic Limbs for Natural Improved Control Laboratory and the Reinforcement Learning and Artificial Intelligence Laboratory for many helpful thoughts  ... 
doi:10.3389/frobt.2018.00079 pmid:33500958 pmcid:PMC7805616 fatcat:dccdkv453renxazlblvoplb65e

Online Coordination of Signals for Heterogeneous Traffic Using Stop Line Detection

Sadguna Nuli, Tom V. Mathew
2013 Procedia - Social and Behavioral Sciences  
This approach has the ability to learn relationships between control action such as cycle time and their effect on the vehicle queuing while pursuing a goal of maximizing intersection throughput.  ...  Real time control of heterogeneous traffic is always a challenge for efficient and effective traffic management.  ...  The controller learns about these actions through reinforcement learning.  ... 
doi:10.1016/j.sbspro.2013.11.171 fatcat:oqrljswckrhwhpkvgvdfxmvcay

Improved Action-Decision Network for Visual Tracking with Meta-learning

Detian Huang, Lingke Kong, Jianqing Zhu, Lixin Zheng
2019 IEEE Access  
Then, in the reinforcement learning based training phase, both the selection criteria for optimal action and the reward function are redesigned separately to explore more appropriate action and eliminate  ...  Reinforcement learning based Action-Decision Network (ADNet) has shown great potential for object tracking.  ...  Secondly, the policy gradient based reinforcement learning is improved so that the tracker can capture the object by selecting more appropriate action and eliminating the useless action.  ... 
doi:10.1109/access.2019.2936551 fatcat:ot4ykjsenbektervnhcd57nnxq

Continuous residual reinforcement learning for traffic signal control optimization

Mohammad Aslani, Stefan Seipel, Marco Wiering
2018 Canadian journal of civil engineering (Print)  
In order to eliminate these difficulties, we develop adaptive traffic signal controllers founded on continuous reinforcement learning.  ...  Local agents and global agents adapt to prevailing traffic conditions through standard Q-learning and continuous Q-learning respectively.  ... 
doi:10.1139/cjce-2017-0408 fatcat:jqmuait5xraazlfmb5uq6b36nu

Hierarchical Program-Triggered Reinforcement Learning Agents For Automated Driving [article]

Briti Gangopadhyay, Harshit Soora, Pallab Dasgupta
2021 arXiv   pre-print
The use of RL agents in autonomous driving leads to a smooth human-like driving experience, but the limited interpretability of Deep Reinforcement Learning (DRL) creates a verification and certification  ...  Recent advances in Reinforcement Learning (RL) combined with Deep Learning (DL) have demonstrated impressive performance in complex tasks, including autonomous driving.  ...  The simulation videos are available at [33] the authors would like to reduce the DRL agent's granularity, for example, to braking and steering, to facilitate continuous switching between them making  ... 
arXiv:2103.13861v1 fatcat:qtmmz4gmgvetbc2ra3zmijt4oy

Best-reply matching in games

Edward Droste, Michael Kosfeld, Mark Voorneveld
2003 Mathematical Social Sciences  
Kosfeld, Droste, and Voorneveld [Games and Economic Behavior 40 (2002) 270] show that best-reply matching equilibria are stationary states in a simple model of social learning, where newborns adopt a best-reply  ...  For example in the centipede game it is shown that players will continue with large probability.  ...  Roth and Erev (1995) , Erev and Roth (1998) , Börgers and Sarin (1997) ) players positively reinforce "good" actions and negatively reinforce "bad" actions.  ... 
doi:10.1016/s0165-4896(03)00065-9 fatcat:p6fhih5aszhodjw737vhy6hpdi

Page 586 of Psychological Abstracts Vol. 51, Issue 3 [page]

1974 Psychological Abstracts  
Reducing the reinforcements per hour for this class while raising that for another class (by 3.3 reinforcements/hr) significantly reduced the conditional probability of 04 sec IRTs.  ...  —Filmed the grooming activities of 16 adult DBA/21 mice individually, and analyzed the film frame by frame with a stop-action projector. 7 components of face grooming were identified.  ... 

The S-R reinforcement theory of extinction

1954 Psychological review  
For instance, Dollard and Miller (5, p. 202), in order to subsume repression under their general theory of anxiety learn- ing, speak of a ‘“‘response of stopping thinking,’”’ reinforced by anxiety re-  ...  Necessarily, then, there is no learned act which can be performed for any length of time; its very repetition— regardless of reinforcement—must lead to its eventual elimination.  ... 
pmid:13134414 fatcat:ymfqokapcnbohb3k7rtzowz3gy

The S-R reinforcement theory of extinction

Henry Gleitman, Jack Nachmias, Ulric Neisser
1954 Psychological review  
For instance, Dollard and Miller (5, p. 202), in order to subsume repression under their general theory of anxiety learn- ing, speak of a ‘“‘response of stopping thinking,’”’ reinforced by anxiety re-  ...  Necessarily, then, there is no learned act which can be performed for any length of time; its very repetition— regardless of reinforcement—must lead to its eventual elimination.  ... 
doi:10.1037/h0062623 fatcat:4jajdmlqongn3ib2xici2jr7hi

The role of associative and non-associative learning in the training of horses and implications for the welfare (a review)

Paolo Baragli, Barbara Padalino, Angelo Telatin
2015 Annali dell'Istituto Superiore di Sanità  
of human injuries and economic loss for civil society and the public health system.  ...  Thus, this review addresses correct horse training based on scientific knowledge in animal learning and psychology.  ...  Acknowledgements The authors are grateful to Cory Kieschnick (Equine Science Department, Delaware Valley College) and Ronald DePeter (Writing Center, Delaware Valley College, USA) for proofreading and  ... 
doi:10.4415/ann_15_01_08 pmid:25857383 fatcat:6fuq7k2voragnbkx4o44mgkzzu

Intelligent buses in a loop service: Emergence of no-boarding and holding strategies [article]

Vee-Liem Saw, Luca Vismara, Lock Yue Chew
2019 arXiv   pre-print
We study how N intelligent buses serving a loop of M bus stops learn a no-boarding strategy and a holding strategy by reinforcement learning.  ...  The high level no-boarding and holding strategies emerge from the low level actions of stay or leave when a bus is at a bus stop and everyone who wishes to alight has done so.  ...  Reward for reinforcement learning of the bus loop systemWith the goal of minimising the average waiting time of commuters for a bus to arrive at a bus stop, each time a bus is at a bus stop (and people  ... 
arXiv:1911.03107v1 fatcat:vtvgqon56jhtvd6fd7rqvwqx5i

Decision Making for Self-Driving Vehicles in Unexpected Environments Using Efficient Reinforcement Learning Methods

Min-Seong Kim, Gyuho Eoh, Tae-Hyoung Park
2022 Electronics  
Reinforcement learning agents may continue to produce wrong decisions in unexpected environments not encountered during the learning process.  ...  Deep reinforcement learning (DRL) enables autonomous vehicles to perform complex decision making using neural networks.  ...  Conversely, deep reinforcement learning (DRL) has the potential for better scalability and generalization than non-learning methods for autonomous driving decision-making problems.  ... 
doi:10.3390/electronics11111685 fatcat:vempbnk5qrcu7nzm3hz7b6kuxa
« Previous Showing results 1 — 15 out of 87,349 results