Filters








3,283 Hits in 7.3 sec

Task Learnability Modulates Surprise but Not Valence Processing for Reinforcement Learning in Probabilistic Choice Tasks

Franz Wurm, Wioleta Walentowska, Benjamin Ernst, Mario Carlo Severo, Gilles Pourtois, Marco Steinhauser
2021 Journal of Cognitive Neuroscience  
Our results revealed that task learnability modulates reinforcement learning via the suppression of surprise processing but leaves the processing of valence unaffected.  ...  On the basis of our model and the data, we propose that task learnability can selectively suppress TD learning as well as alter behavioral adaptation based on a flexible cost–benefit arbitration.  ...  Acknowledgments This work was supported by funding from the National Science Centre of Poland (2015/19/ B/HS6/01259) and the Polish National Agency for Academic Exchange (Bekker Programme signature: PPN  ... 
doi:10.1162/jocn_a_01777 pmid:34879392 fatcat:67djnugpajhjvglej4gyk4je7u

Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft [article]

Ingmar Kanitscheider, Joost Huizinga, David Farhi, William Hebgen Guss, Brandon Houghton, Raul Sampedro, Peter Zhokhov, Bowen Baker, Adrien Ecoffet, Jie Tang, Oleg Klimov, Jeff Clune
2021 arXiv   pre-print
We find that learning progress (defined as a change in success probability of a task) is a reliable measure of learnability for automatically constructing an effective curriculum.  ...  Experiments demonstrate that: (1) a within-episode exploration bonus for obtaining new items improves performance, (2) dynamically adjusting this bonus across training such that it only applies to items  ...  Acknowledgments We thank Ilge Akkaya, Bob McGrew, Reiichiro Nakano, Matthias Plappert and John Schulman for discussions, support and feedback on this manuscript.  ... 
arXiv:2106.14876v1 fatcat:wpxs2t5otjeltgfqw4tp3mf5cu

Randomized Value Functions via Multiplicative Normalizing Flows [article]

Ahmed Touati, Harsh Satija, Joshua Romoff, Joelle Pineau, Pascal Vincent
2019 arXiv   pre-print
Unlike traditional point estimate methods, randomized value functions maintain a posterior distribution over action-space values.  ...  Randomized value functions offer a promising approach towards the challenge of efficient exploration in complex environments with high dimensional state and action spaces.  ...  To have a fair comparison across all algorithms, we fill the replay buffer with actions selected at random for the first 50 thousand time-steps.  ... 
arXiv:1806.02315v3 fatcat:6nbtlhjvo5e4pbzojhvudaroje

Selective maintenance of value information helps resolve the exploration/exploitation dilemma

Michael N. Hallquist, Alexandre Y. Dombrovski
2019 Cognition  
Selectively maintaining preferred action values while allowing others to decay renders the choices increasingly exploitative across learning episodes.  ...  Cognitively demanding uncertainty-directed exploration recovered a more accurate representation in simulations with no foraging advantage and was strongly unsupported in our human study.  ...  Frank for helpful comments on validation and comparison of reinforcement learning models, as well as codes for the experimental paradigm and time-clock model.  ... 
doi:10.1016/j.cognition.2018.11.004 pmid:30502584 pmcid:PMC6328060 fatcat:iicwrnkrzbfzhamhujaxznlegq

Sequential Learning without Feedback [article]

Manjesh Hanawal and Csaba Szepesvari and Venkatesh Saligrama
2016 arXiv   pre-print
Our objective is to learn strategies for selecting tests to optimize accuracy & costs.  ...  This is a condition on the joint probability distribution of test outputs and latent state and says that whenever a test is accurate on an example, a later test in the sequence is likely to be accurate  ...  Joe Wang for helpful discussions and in particular suggesting the concept of strong dominance.  ... 
arXiv:1610.05394v1 fatcat:6yqx4n4wszao3lo5smha3f6vji

Collectives for multiple resource job scheduling across heterogeneous servers

Kagan Tumer, John Lawson
2003 Proceedings of the second international joint conference on Autonomous agents and multiagent systems - AAMAS '03  
Efficient management of large-scale, distributed data storage and processing systems is a major challenge for many computational applications.  ...  On the other hand, agents configured using collectives outperform both team games and load balancing (by up to four times for the latter), despite their distributed nature and their limited access to information  ...  The f i s t matrix represents the joint state of the system z where agent 1 has selected action 1, agent 2 has selected action 3, agent 3 has selected action 1 and agent 4 has selected action 2.  ... 
doi:10.1145/860722.860836 fatcat:str76z7vubfebdr2i6k7tj2goy

Collectives for multiple resource job scheduling across heterogeneous servers

Kagan Tumer, John Lawson
2003 Proceedings of the second international joint conference on Autonomous agents and multiagent systems - AAMAS '03  
Efficient management of large-scale, distributed data storage and processing systems is a major challenge for many computational applications.  ...  On the other hand, agents configured using collectives outperform both team games and load balancing (by up to four times for the latter), despite their distributed nature and their limited access to information  ...  The f i s t matrix represents the joint state of the system z where agent 1 has selected action 1, agent 2 has selected action 3, agent 3 has selected action 1 and agent 4 has selected action 2.  ... 
doi:10.1145/860575.860836 dblp:conf/atal/TumerL03 fatcat:twptn66sffdivpnx7lja5p4uha

Variational Bayes: A report on approaches and applications [article]

Manikanta Srikar Yellapragada, Chandra Prakash Konkimalla
2019 arXiv   pre-print
Variational methods have been used for approximating intractable integrals that arise in Bayesian inference for neural networks.  ...  Acknowledgement The authors would like to thank Joan Bruna for his feedback and providing this opportunity.  ...  The main idea is to encourage deep exploration by creating a new Deep Q -learning architecture that supports selecting actions from randomized Q-functions that are trained on bootstrapped data.  ... 
arXiv:1905.10744v1 fatcat:r4jtbsxjqvfivignhcmljzrbky

The Benefits of Boredom: an Exploration in Developmental Robotics

Scott Bolland, Shervin Emami
2007 2007 IEEE Symposium on Artificial Life  
Such approaches are aimed at motivating the exploration of sensory-motor contingencies for which mental models have not yet been accurately formed, driving the agent to develop taskindependent competencies  ...  Self-directed learning is an essential component of artificial and biological intelligent systems that are required to interact with and adapt to complex real world environments.  ...  The authors wish to thank the various members of the ARC Centre for Complex Systems for their feedback and support.  ... 
doi:10.1109/alife.2007.367792 dblp:conf/ieeealife/BollandE07 fatcat:xxpcvwmvsjgwphpnpzzi6ggoye

On Assessing The Safety of Reinforcement Learning algorithms Using Formal Methods [article]

Paulina Stevia Nouwou Mindom and Amin Nikanjam and Foutse Khomh, John Mullins
2021 arXiv   pre-print
The increasing adoption of Reinforcement Learning in safety-critical systems domains such as autonomous vehicles, health, and aviation raises the need for ensuring their safety.  ...  For future work, we plan to investigate more effective defense mechanisms against the learnable adversary.  ...  We check for Algorithm 1: Q-learning with observations Result: act if random.uniform(0, 1) < epsilon (the exploration probability) then Explore: select a random action act; else Exploit: select the action  ... 
arXiv:2111.04865v2 fatcat:4chehslwuffcnpe5migzt4owri

DR2L: Surfacing Corner Cases to Robustify Autonomous Driving via Domain Randomization Reinforcement Learning [article]

Haoyi Niu, Jianming Hu, Zheyu Cui, Yi Zhang
2021 arXiv   pre-print
an inevitable Sim2real gap, which probably accounts for the underperformance in novel, anomalous and risky cases that simulators can hardly generate.  ...  Domain Randomization(DR) is a methodology that can bridge this gap with little or no real-world data.  ...  As for the learning process, action( A t ) is selected by Qvalues, otherwise, action will be chosen randomly with the probability of ϵ (exploration rate, decaying each step until reaching the minimum).  ... 
arXiv:2107.11762v1 fatcat:caxmfxjejrcuxnm5nmpbszgvs4

Adversaries in Online Learning Revisited: with applications in Robust Optimization and Adversarial training [article]

Sebastian Pokutta, Huan Xu
2021 arXiv   pre-print
We then apply this to solving robust optimization problems or (equivalently) adversarial training problems via online learning and establish a general approach for a large variety of problem classes using  ...  Specifically, there are two fundamentally different types of adversaries, depending on whether the "adversary" is able to anticipate the exogenous randomness of the online learning algorithms.  ...  Recently, several works explored a general framework to solve robust optimization and adversarial training via online learning.  ... 
arXiv:2101.11443v1 fatcat:f6iiukregnclrnylexkjup4fdq

Explore and Match: A New Paradigm for Temporal Video Grounding with Natural Language [article]

Sangmin Woo, Jinyoung Park, Inyong Koo, Sumin Lee, Minki Jeong, Changick Kim
2022 arXiv   pre-print
In this work, we present a new paradigm named Explore-and-Match for TVG that seamlessly unifies two streams of TVG methods: proposal-free and proposal-based; the former explores the search space to find  ...  Code is available at https://github.com/sangminwoo/Explore-and-Match.  ...  Distribution of learnable proposals. We visualize the time segment predictions of 10 out of all learnable proposals in Fig. 7 .  ... 
arXiv:2201.10168v2 fatcat:tfznkeukwbeb7k6nti7lmduxbi

User Tampering in Reinforcement Learning Recommender Systems [article]

Charles Evans, Atoosa Kasirzadeh
2021 arXiv   pre-print
This safety concern is what we call "user tampering" -- a phenomenon whereby an RL-based recommender system may manipulate a media user's opinions, preferences and beliefs via its recommendations as part  ...  Finally, we argue that given our findings, designing an RL-based recommender system which cannot learn to exploit user tampering requires making the metric for the recommender's success independent of  ...  Formally, 'privacy' requires that no direct paths exist between that variable's distribution at one time-step, and the random state distribution at the next step -so, in our case, that no paths exist of  ... 
arXiv:2109.04083v1 fatcat:w4fkaofvojam7nli2nxlrwsgpa

A Survey of Exploration Methods in Reinforcement Learning [article]

Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup
2021 arXiv   pre-print
Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process as the lack of enough information could hinder effective learning.  ...  In this article, we provide a survey of modern exploration methods in (Sequential) reinforcement learning, as well as a taxonomy of exploration methods.  ...  In particular, the z-greedy agent exploits with probability 1 − and explores via repeating the same action for a certain number of steps n ∼ z, where z(n) is a distribution over the actionrepeat duration  ... 
arXiv:2109.00157v2 fatcat:dlqhzwxscnfbxpt2i6rp7ovp6i
« Previous Showing results 1 — 15 out of 3,283 results