Filters








2,913 Hits in 7.0 sec

Generating Diverse Programs with Instruction Conditioned Reinforced Adversarial Learning [article]

Aishwarya Agrawal, Mateusz Malinowski, Felix Hill, Ali Eslami, Oriol Vinyals, Tejas Kulkarni
2018 arXiv   pre-print
We demonstrate that with simple changes to the reinforced adversarial learning objective, we can learn instruction conditioned policies to achieve the corresponding diverse set of goals.  ...  Since a single instruction corresponds to a diverse set of different but still consistent end-goal images, the agent needs to learn to generate a distribution over programs given an instruction.  ...  Acknowledgments: The authors wish to acknowledge Igor Babuschkin, Caglar Gulcehre, Pushmeet Kohli, Piotr Trochim, and Antonio Garcia for their advice in this project.  ... 
arXiv:1812.00898v1 fatcat:gb3qnq77cnbiha3qzlhjym4um4

Scalable agent alignment via reward modeling: a research direction [article]

Jan Leike and David Krueger and Tom Everitt and Miljan Martic and Vishal Maini and Shane Legg
2018 arXiv   pre-print
We outline a high-level research direction to solve the agent alignment problem centered around reward modeling: learning a reward function from interaction with the user and optimizing the learned reward  ...  function with reinforcement learning.  ...  Acknowledgments This paper has benefited greatly from discussions with many people at DeepMind, OpenAI, and the Future of Humanity Institute.  ... 
arXiv:1811.07871v1 fatcat:sbajbquhenh3nmeu3njm3uw5fu

HIGhER: Improving instruction following with Hindsight Generation for Experience Replay

Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin
2020 2020 IEEE Symposium Series on Computational Intelligence (SSCI)  
Whenever the agent does not fulfill its instruction, HIGhER learns to output a new directive that matches the agent trajectory, and it relabels the episode with a positive reward.  ...  in even simple instruction following scenarios.  ...  ACKNOWLEDGEMENTS The authors would like to acknowledge the stimulating research environment of the SequeL INRIA Project-Team.  ... 
doi:10.1109/ssci47803.2020.9308603 fatcat:wobttq75wnemfnmeaipu5tucti

Learning From Explanations Using Sentiment and Advice in RL

Samantha Krening, Brent Harrison, Karen M. Feigh, Charles Lee Isbell, Mark Riedl, Andrea Thomaz
2017 IEEE Transactions on Cognitive and Developmental Systems  
In order for robots to learn from people with no machine learning expertise, robots should learn from natural human instruction.  ...  We developed Object-focused advice to represent what actions the agent should take when dealing with objects. An RL agent used Object-focused advice to learn policies that maximized its reward.  ...  Before the agent starts learning, a person instructs the agent what action to take when dealing with an object.  ... 
doi:10.1109/tcds.2016.2628365 fatcat:fma76teaijd7rcp4tuj2ia3mna

Mapping Language to Programs using Multiple Reward Components with Inverse Reinforcement Learning [article]

Sayan Ghosh, Shashank Srivastava
2021 arXiv   pre-print
Mapping natural language instructions to programs that computers can process is a fundamental challenge.  ...  Existing approaches focus on likelihood-based training or using reinforcement learning to fine-tune models based on a single reward.  ...  From language to goals: Inverse reinforcement learning for vision-based instruction following. In International Conference on Learning Representations.  ... 
arXiv:2110.00842v1 fatcat:cmgwzzkkk5eudm6astqqftwe24

A Survey of Reinforcement Learning Informed by Natural Language

Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel
2019 Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence  
We survey the state of the field, including work on instruction following, text games, and learning from textual domain knowledge.  ...  To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand.  ...  In the former category we review instruction following, induction of reward from language, and environments with text in the action or observation space, all of which have language in the problem formulation  ... 
doi:10.24963/ijcai.2019/880 dblp:conf/ijcai/LuketinaNFFAGWR19 fatcat:vyjt3t4kbjbkdgyb2ra3quppbq

HIGhER : Improving instruction following with Hindsight Generation for Experience Replay [article]

Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin
2020 arXiv   pre-print
Whenever the agent does not fulfill its instruction, HIGhER learns to output a new directive that matches the agent trajectory, and it relabels the episode with a positive reward.  ...  in even simple instruction following scenarios.  ...  IRL for instruction following Bahdanau et al. (2019a)learn a mapping from <instruction, state> to a reward function.  ... 
arXiv:1910.09451v3 fatcat:cxwhjk2d7vg6jjzutcxvdsrdqi

Neural Program Synthesis By Self-Learning [article]

Yifan Xu, Lu Dai, Udaikaran Singh, Kening Zhang, Zhuowen Tu
2019 arXiv   pre-print
Neural inductive program synthesis is a task generating instructions that can produce desired outputs from given inputs.  ...  Policy networks and value networks are learned to reduce the breadth and depth of the Monte Carlo Tree Search, resulting in better synthesis performance.  ...  Adapt self-learning reinforcement learning into the neural inductive program synthesis process to perform assembly program synthesis. 2.  ... 
arXiv:1910.05865v1 fatcat:5f4agcgvfjgafala6lqelrbjo4

An overview of 11 proposals for building safe advanced AI [article]

Evan Hubinger
2020 arXiv   pre-print
debate, and recursive reward modeling.  ...  This paper analyzes and compares 11 different proposals for building safe advanced AI under the current machine learning paradigm, including major contenders such as iterated amplification, AI safety via  ...  If following human instructions is incentivized, for example, that could lead to corrigibility in the limit-or it could lead to agents that only choose to follow human instructions for the instrumental  ... 
arXiv:2012.07532v1 fatcat:mfcsnozm5rec7jksxvizhvz4pu

How Can We Accelerate Progress Towards Human-like Linguistic Generalization? [article]

Tal Linzen
2020 arXiv   pre-print
This contrasts with humans, who learn language from several orders of magnitude less data than the systems favored by this evaluation paradigm, and generalize to new tasks in a consistent way.  ...  We advocate for supplementing or replacing PAID with paradigms that reward architectures that generalize as quickly and robustly as humans.  ...  to perform tasks with minimal instruction way.  ... 
arXiv:2005.00955v1 fatcat:yfbtvxmjfbcyvksbg2zio3xkeu

Human Decision Makings on Curriculum Reinforcement Learning with Difficulty Adjustment [article]

Yilei Zeng, Jiali Duan, Yang Li, Emilio Ferrara, Lerrel Pinto, C.-C. Jay Kuo, Stefanos Nikolaidis
2022 arXiv   pre-print
While abundant research has been helping AI achieve superhuman performance either by fully automatic or weak supervision learning, fewer endeavors are experimenting with how AI can tailor to humans' preferred  ...  To achieve this, we developed a portable, interactive platform that enables the user to interact with agents online via manipulating the task difficulty, observing performance, and providing curriculum  ...  For the third goal, we display real-time instructions and allow users to inspect learning progress when designing the curriculum.  ... 
arXiv:2208.02932v1 fatcat:6bqukbmo7fegrnwrqxhizbygd4

Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning [article]

DeepMind Interactive Agents Team: Josh Abramson, Arun Ahuja, Arthur Brussee, Federico Carnevale, Mary Cassin, Felix Fischer, Petko Georgiev, Alex Goldin, Mansi Gupta, Tim Harley, Felix Hill, Peter C Humphreys (+12 others)
2022 arXiv   pre-print
We show that imitation learning of human-human interactions in a simulated world, in conjunction with self-supervised learning, is sufficient to produce a multimodal interactive agent, which we call MIA  ...  , that successfully interacts with non-adversarial humans 75% of the time.  ...  Acknowledgments The authors would like to thank Duncan Williams, Daan Wierstra, Dario de Cesare, Koray Kavukcuoglu, Matt Botvinick, Lorrayne Bennett, the Worlds Team, and Crowd Compute.  ... 
arXiv:2112.03763v2 fatcat:xegc5kw4cncmtdnxlzlflfbaau

A Home Service-Oriented Question Answering System with High Accuracy and Stability

Mengyang Zhang, Mengyang Zhang, Guohui Tian, Ying Zhang
2019 IEEE Access  
In order to optimize the model parameters effectively, the reinforcement learning is employed and both factors on accuracy and stability are regarded as rules in designing rewards.  ...  With the development of deep learning, neural network-based (NN-based) methods have been applied in question answering (QA) widely and achieved significant progress.  ...  The extent that QA systems truly understand language remains unclear, and it is promising to make the QA system learn instructively and selectively.  ... 
doi:10.1109/access.2019.2894438 fatcat:s5il3vk2wbccrbug6ywrubzckm

IGLU 2022: Interactive Grounded Language Understanding in a Collaborative Environment at NeurIPS 2022 [article]

Julia Kiseleva and Alexey Skrynnik and Artem Zholus and Shrestha Mohanty and Negar Arabzadeh and Marc-Alexandre Côté and Mohammad Aliannejadi and Milagro Teruel and Ziming Li and Mikhail Burtsev and Maartje ter Hoeve and Zoya Volovikova and Aleksandr Panov and Yuxuan Sun and Kavya Srinet and Arthur Szlam and Ahmed Awadallah
2022 arXiv   pre-print
Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions.  ...  The primary goal of the competition is to approach the problem of how to develop interactive embodied agents that learn to solve a task while provided with grounded natural language instructions in a collaborative  ...  Specifically, the goal of our competition is to approach the following scientific challenge: How to build interactive embodied agents that learn to solve a task while provided with grounded natural language  ... 
arXiv:2205.13771v1 fatcat:5a2hp7wuw5f4bmxx33wjoxisie

Deep Reinforcement Learning [article]

Yuxi Li
2018 arXiv   pre-print
We start with background of artificial intelligence, machine learning, deep learning, and reinforcement learning (RL), with resources.  ...  Then we discuss important mechanisms for RL, including attention and memory, unsupervised learning, hierarchical RL, multi-agent RL, relational RL, and learning to learn.  ...  Lanctot et al. (2017) observe that independent RL, in which each agent learns by interacting with the environment, oblivious to other agents, can overfit the learned policies to other agents' policies  ... 
arXiv:1810.06339v1 fatcat:kp7atz5pdbeqta352e6b3nmuhy
« Previous Showing results 1 — 15 out of 2,913 results