Filters








69,918 Hits in 2.6 sec

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning [article]

Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare
2021 arXiv   pre-print
To improve generalization, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process.  ...  We also present a contrastive representation learning procedure to embed any state similarity metric, which we instantiate with PSM to obtain policy similarity embeddings (PSEs).  ...  Given a state similarity metric d, we develop a general procedure (Algorithm 1) to learn contrastive metric embeddings (CMEs) for d.  ... 
arXiv:2101.05265v2 fatcat:54kisb4zjzc3zkfj2x47myquma

Time-Contrastive Networks: Self-Supervised Learning from Video [article]

Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine
2018 arXiv   pre-print
Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems.  ...  In other words, the model simultaneously learns to recognize what is common between different-looking images, and what is different between similar-looking images.  ...  III-B to learn robotic imitation behaviors from third-person demonstrations through reinforcement learning.  ... 
arXiv:1704.06888v3 fatcat:mqt2bdjvobc7lidrtvrc3rtnoi

Decision Transformer: Reinforcement Learning via Sequence Modeling [article]

Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch
2021 arXiv   pre-print
We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem.  ...  In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling.  ...  We also thank Luke Metz and Daniel Freeman for valuable feedback and discussions, as well as Justin Fu for assistance in setting up D4RL benchmarks, and Aviral Kumar for assistance with the CQL baselines  ... 
arXiv:2106.01345v2 fatcat:3l67xzhmrrhspn76wi32nziaum

Internal-external control of reinforcement and embedded-figures performance

June E. Chance, Alvin G. Goldstein
1971 Perception & Psychophysics  
Internal-external control of reinforcement and embedded-figures performance Male and female Ss were tested in an extended series of 68 embedded figures.  ...  Rate of decrease in discovery time was related to Ss' attitudes about locus of control of r~in.forcing outcomes.  ...  Rotter (1966) proposed a generalized expectancy for internal vs external control of reinforcement.  ... 
doi:10.3758/bf03213024 fatcat:njqsjqabyre3jogebyjcx7erfe

Scalable Multi-Task Imitation Learning with Autonomous Improvement [article]

Avi Singh, Eric Jang, Alexander Irpan, Daniel Kappler, Murtaza Dalal, Sergey Levine, Mohi Khansari, Chelsea Finn
2020 arXiv   pre-print
In contrast to prior imitation learning approaches, our method can autonomously collect data with sparse supervision for continuous improvement, and in contrast to reinforcement learning algorithms, our  ...  for the robot to effectively generalize broadly.  ...  A latent task space, learned using the same initial dataset, is then used to find similarities in the collected trials, and generate new tasks for meta-imitation learning.  ... 
arXiv:2003.02636v1 fatcat:b4hgykegl5c3nmeawkogib6ile

Learning to Assist Agents by Observing Them [article]

Antti Keurulainen
2021 arXiv   pre-print
On the other hand, offline data about the behavior of the assisted agent might be available, but is non-trivial to take advantage of by methods such as offline reinforcement learning.  ...  Training such an ability by using reinforcement learning usually requires large amounts of online training, which is difficult and costly.  ...  Acknowledgements This work was supported by the Academy of Finland (Flagship programme: Finnish Center for Artificial Intelligence FCAI, and grants 319264, 292334).  ... 
arXiv:2110.01311v1 fatcat:ezbkj6xf4rhxzbhxoigrvl23y4

Behavior Self-Organization Supports Task Inference for Continual Robot Learning [article]

Muhammad Burhan Hafez, Stefan Wermter
2021 arXiv   pre-print
Task inference is made by finding the nearest behavior embedding to a demonstrated behavior, which is used together with the environment state as input to a multi-task policy trained with reinforcement  ...  Our approach performs unsupervised learning of behavior embeddings by incrementally self-organizing demonstrated behaviors.  ...  We call our GWR-based model for unsupervised learning of behavior embeddings GWR-B.  ... 
arXiv:2107.04533v1 fatcat:u4fet5lhk5eizmhser2zvuol5m

EMBEDDED PROMPTING MAY FUNCTION AS EMBEDDED PUNISHMENT: DETECTION OF UNEXPECTED BEHAVIORAL PROCESSES WITHIN A TYPICAL PRESCHOOL TEACHING STRATEGY

Nicole A Heal, Gregory P Hanley, Anna Petursdottir
2011 Journal of Applied Behavior Analysis  
In addition, embedded teacher prompts, common in child-led teaching procedures, functioned as a punisher for the child's toy play.  ...  The efficacy of and preference for three strategies that varied in teacher directedness were assessed in a multielement design and concurrent-chains arrangement, respectively.  ...  In contrast, when teaching a tact (e.g., color identification; i.e., verbal behavior under the control of a discriminative stimulus and reinforced with a nonspecific and generalized reinforcer) embedded  ... 
doi:10.1901/jaba.2011.44-127 pmid:21541134 pmcid:PMC3050461 fatcat:y3k6hj72xnatnjwcu3gm6zwqpe

Modelling the development of counting with memory-augmented neural networks [article]

Zack Dulberg, Taylor Webb, Jonathan Cohen
2021 arXiv   pre-print
Learning to count is an important example of the broader human capacity for systematic generalization, and the development of counting is often characterized by an inflection point when children rapidly  ...  We aimed to model this process by training a reinforcement learning agent to select N items from a binary vector when instructed (known as the give-N task).  ...  Thank you to Steven Frankland, Simon Segert, Randall O'Reilly, and Alexander Petrov for their helpful discussions.  ... 
arXiv:2105.10577v1 fatcat:nasat7ywevhz7hpb5ndrtok4vu

Jointly-Trained State-Action Embedding for Efficient Reinforcement Learning [article]

Paul J. Pritz and Liang Ma and Kin K. Leung
2020 arXiv   pre-print
In this work, we propose a new approach for jointly learning embeddings for states and actions that combines aspects of model-free and model-based reinforcement learning, which can be applied in both discrete  ...  In this way, the embedded representations obtained via our approach enable better generalization over both states and actions by capturing similarities in the embedding spaces.  ...  Instead, our embedding technique projects states into a continuous state embedding space, similar to Zhang et al. (2020) , where their behavioral similarity is captured in their proximity in embedding  ... 
arXiv:2010.04444v3 fatcat:deej56fusfdxvcexgm2xqy2vim

Inverse reinforcement learning for video games [article]

Aaron Tucker and Adam Gleave and Stuart Russell
2018 arXiv   pre-print
In our CNN-AIRL baseline, we modify the state-of-the-art adversarial IRL (AIRL) algorithm to use CNNs for the generator and discriminator.  ...  Deep reinforcement learning achieves superhuman performance in a range of video game environments, but requires that a designer manually specify a reward function.  ...  We would like to thank Sam Toyer, Lawrence Chan, Matthew Rahtz and Daniel Filan for comments on an earlier draft.  ... 
arXiv:1810.10593v1 fatcat:t6co2wtxtfa6jfgoyipt6jhcn4

Adversarial Contrastive Estimation [article]

Avishek Joey Bose, Huan Ling, Yanshuai Cao
2018 arXiv   pre-print
Noise contrastive estimation (NCE) for word embeddings and translating embeddings for knowledge graphs are examples in NLP employing this approach.  ...  Learning by contrasting positive and negative samples is a general strategy adopted by many methods.  ...  Acknowledgments We would like to thank Teng Long for providing the initial baseline code on knowledge graph embeddings, Matthew E.  ... 
arXiv:1805.03642v3 fatcat:32b6a6haircv5p7az7qf7pbkbm

Social NCE: Contrastive Learning of Socially-aware Motion Representations [article]

Yuejiang Liu, Qi Yan, Alexandre Alahi
2021 arXiv   pre-print
Intuitively, if the training data only comes from human behaviors in safe spaces, i.e., from "positive" examples, it is difficult for learning algorithms to capture the notion of "negative" examples like  ...  Our method substantially reduces the collision rates of recent trajectory forecasting, behavioral cloning and reinforcement learning algorithms, outperforming state-of-the-art methods on several benchmarks  ...  We thank Parth Kothari, Yifan Sun, Taylor Mordan, Mohammadhossein Bahari, Lorenzo Bertoni and Sven Kreiss for valuable feedback on early drafts.  ... 
arXiv:2012.11717v3 fatcat:dcv4cxotlfb33e6vq2ntnocugu

Page 1264 of Psychological Abstracts Vol. 65, Issue 6 [page]

1981 Psychological Abstracts  
equating relative reinforcement frequency may also result in behavioral contrast. (7 ref) 11873.  ...  (U Southern Mississippi) Behavioral contrast in humans with resp pendent reinforce- ment. Journal of General Psychology, 1979(Jan), Vol 100(1), 159-160. —Replicated M. S. Halliday and R. A.  ... 

RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning [article]

Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel
2016 arXiv   pre-print
Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials.  ...  In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap.  ...  ACKNOWLEDGMENTS We would like to thank our colleagues at Berkeley and OpenAI for insightful discussions. This research was funded in part by ONR through a PECASE award.  ... 
arXiv:1611.02779v2 fatcat:5uies6uzlnhwpdmjwx3ofnz4oq
« Previous Showing results 1 — 15 out of 69,918 results