92 Hits in 4.6 sec

Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies [article]

Sungryull Sohn, Junhyuk Oh, Honglak Lee
2019 arXiv   pre-print
We introduce a new RL problem where the agent is required to generalize to a previously-unseen environment characterized by a subtask graph which describes a set of subtasks and their dependencies.  ...  it with MCTS.  ...  Acknowledgments This work was supported mainly by the ICT R&D program of MSIP/IITP (2016-0-00563: Research on Adaptive Machine Learning Technology Development for Intelligent Autonomous Digital Companion  ... 
arXiv:1807.07665v4 fatcat:kc4p5w5eyzdulpxhwvd52elugq

Modular Multitask Reinforcement Learning with Policy Sketches [article]

Jacob Andreas and Dan Klein and Sergey Levine
2017 arXiv   pre-print
We describe a framework for multitask deep reinforcement learning guided by policy sketches.  ...  guidance used by much previous work on learning policy abstractions for RL (e.g. intermediate rewards, subtask completion signals, or intrinsic motivations).  ...  Our contributions are: • A general paradigm for multitask, hierarchical, deep reinforcement learning guided by abstract sketches of task-specific policies. • A concrete recipe for learning from these sketches  ... 
arXiv:1611.01796v2 fatcat:mdxg3iufvrb3tarbpdvvdmcu6m

Gated-Attention Architectures for Task-Oriented Language Grounding [article]

Devendra Singh Chaplot, Kanthashree Mysore Sathyendra, Rama Kumar Pasumarthi, Dheeraj Rajagopal, Ruslan Salakhutdinov
2018 arXiv   pre-print
learning methods.  ...  The proposed model combines the image and text representations using a Gated-Attention mechanism and learns a policy to execute the natural language instruction using standard reinforcement and imitation  ...  Tadas Baltruaitis for their valuable comments and guidance throughout the development of this work.  ... 
arXiv:1706.07230v2 fatcat:pckcwi6gbbaqzoiesionmcsrou

Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas [article]

Yen-Ling Kuo, Boris Katz, Andrei Barbu
2020 arXiv   pre-print
The structures required for this generalization are specific to LTL formulas, which opens up an interesting theoretical question: what structures are required in neural networks for zero-shot generalization  ...  The input LTL formulas have never been seen before, yet the network performs zero-shot generalization to satisfy them.  ...  Together, these results show that our method learns to generalize formulas and to execute them zero-shot. V.  ... 
arXiv:2006.01110v2 fatcat:5oewaleiqzfubm2wuue6uagb4m

Leveraging Table Content for Zero-shot Text-to-SQL with Meta-Learning [article]

Yongrui Chen, Xinnan Guo, Chaojie Wang, Jian Qiu, Guilin Qi, Meng Wang, Huiying Li
2021 arXiv   pre-print
The strategy utilizes the two-step gradient update to force the model to learn a generalization ability towards zero-shot tables.  ...  In this paper, we propose a new approach for the zero-shot text-to-SQL task which does not rely on any additional manual annotations. Our approach consists of two parts.  ...  (Chang et al. 2020 ) explicitly deals with zero-shot tables for the first time.  ... 
arXiv:2109.05395v1 fatcat:mkizvwbl4zg2hpldhcpxqhkjey

Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes [article]

Guillermo Infante, Anders Jonsson, Vicenç Gómez
2022 arXiv   pre-print
In this work we present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes.  ...  As a consequence, our approach can learn the globally optimal policy, and does not suffer from the non-stationarity of high-level decisions.  ...  One of the computational advantages of LMDPs is compositionality, which allows for zero-shot learning of new skills by linearly combining previously learned base skills which only differ in their cost  ... 
arXiv:2106.15380v3 fatcat:z7rdlxnz5bdufecgxrsakghr2m

The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents [article]

Kurt Shuster, Da Ju, Stephen Roller, Emily Dinan, Y-Lan Boureau, Jason Weston
2020 arXiv   pre-print
We show that such multi-tasking improves over a BERT pre-trained baseline, largely due to multi-tasking with very large dialogue datasets in a similar domain, and that the multi-tasking in general provides  ...  We obtain state-of-the-art results on many of the tasks, providing a strong baseline for this challenge.  ...  Zero-shot Transfer Finally, we consider a leaveone-out zero-shot setting whereby training is constrained to be on all the training data except for the task being evaluated.  ... 
arXiv:1911.03768v2 fatcat:xafwif4fbfaptm3jho67tukpeq

CALVIN: A Benchmark for Language-conditioned Policy Learning for Long-horizon Robot Manipulation Tasks [article]

Oier Mees, Lukas Hermann, Erick Rosete-Beas, Wolfram Burgard
2021 arXiv   pre-print
We evaluate the agents in zero-shot to novel language instructions and to novel environments and objects.  ...  General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range of daily tasks.  ...  To open the door for future development of agents that can generalize abstract concepts to unseen entities the same way humans do, we include a challenging zero-shot evaluation by training on large play  ... 
arXiv:2112.03227v2 fatcat:aw3vvlb7ejeofodzw7xcdjnysm

PLOTS: Procedure Learning from Observations using Subtask Structure [article]

Tong Mu, Karan Goel, Emma Brunskill
2019 arXiv   pre-print
In comparing to some state-of-the-art approaches we find that our explicit procedural learning from observation method is about 100 times faster than policy-gradient based approaches that learn a stochastic  ...  In many cases an intelligent agent may want to learn how to mimic a single observed demonstrated trajectory.  ...  zero.  ... 
arXiv:1904.09162v1 fatcat:svw7oad6bndydmhgfi3yxinb7q

Universal Successor Features Approximators [article]

Diana Borsa, André Barreto, John Quan, Daniel Mankowitz, Rémi Munos, Hado van Hasselt, David Silver, Tom Schaul
2018 arXiv   pre-print
The ability of a reinforcement learning (RL) agent to learn about many reward functions at the same time has many potential benefits, such as the decomposition of complex tasks into simpler ones, the exchange  ...  Generalised policy improvement (GPI) combines solutions of previous tasks into a policy for the unseen task; this relies on instantaneous policy evaluation of old policies under the new reward function  ...  INTRODUCTION Reinforcement learning (RL) provides a general framework to model sequential decision-making problems with sparse evaluative feedback in the form of rewards.  ... 
arXiv:1812.07626v1 fatcat:ptxih27fezbavg47nqil4w7qry

Tech United Eindhoven, Winner RoboCup 2014 MSL [chapter]

Cesar Lopez Martinez, Ferry Schoenmakers, Gerrit Naus, Koen Meessen, Yanick Douven, Harrie van de Loo, Dennis Bruijnen, Wouter Aangenent, Joost Groenen, Bob van Ninhuijs, Matthias Briegel, Rob Hoogendijk (+13 others)
2015 Lecture Notes in Computer Science  
Via qr-code detection we can pass coaching instructions to our robots and with a basic machine learning algorithm success and failure after free-kicks is taken into account.  ...  In terms of intelligent gameplay we have worked on creating possibilities for in-game optimization of strategic decisions.  ...  A reinforcement learning algorithm is built around actions, states and rewards [?] .  ... 
doi:10.1007/978-3-319-18615-3_5 fatcat:72nrhgjj2rhn3azb2tafzqhgmm

Contextualized Cross-Lingual Event Trigger Extraction with Minimal Resources

Meryem M'hamdi, Marjorie Freedman, Jonathan May
2019 Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)  
However, there has not been much effort in exploring language transfer using BERT for event extraction.  ...  Recently, contextualized Bidirectional Encoder Representations from Transformers (BERT) models have established state-of-the-art performance for a variety of NLP tasks.  ...  Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.  ... 
doi:10.18653/v1/k19-1061 dblp:conf/conll/MhamdiFM19 fatcat:u2bi77j5vbeqneqvi3mwydjhhm

Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives [article]

Murtaza Dalal, Deepak Pathak, Ruslan Salakhutdinov
2021 arXiv   pre-print
Despite the potential of reinforcement learning (RL) for building general-purpose robotic systems, training RL agents to solve robotics tasks still remains challenging due to the difficulty of exploration  ...  In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy.  ...  Acknowledgments We thank Shikhar Bahl, Ben Eyesenbach, Aravind Sivakumar, Rishi Veerapaneni, Russell Mendonca and Paul Liang for feedback on early drafts of this paper.  ... 
arXiv:2110.15360v1 fatcat:5b6eobkxp5dmdhwlygkqglfz2y

A Survey on Visual Navigation for Artificial Agents with Deep Reinforcement Learning

Fanyu Zeng, Chen Wang, Shuzhi Sam Ge
2020 IEEE Access  
Visual navigation for artificial agents with deep reinforcement learning (DRL) is a new research hotspot in artificial intelligence and robotics that incorporates the decision making of DRL into visual  ...  In this paper, we first present an overview on reinforcement learning (RL), deep learning (DL) and deep reinforcement learning (DRL).  ...  Fig. 16 is the illustration of gradient update for policy parameters with MAML meta learning. Meta learning is a few-shot data method, and it only needs a small amount of data for generalization.  ... 
doi:10.1109/access.2020.3011438 fatcat:ie6qvu24qbapbjxtiudh7fumgy

Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation [article]

Igor Shalyminov, Alessandro Sordoni, Adam Atkinson, Hannes Schulz
2020 arXiv   pre-print
Pre-training on large data sources and adapting to the target data has become the standard method for few-shot problems within the deep learning framework.  ...  Deep learning, while being the preferred technique for modeling such systems, works best given massive training data.  ...  (Eshghi, Purver, and Hough 2011) combined with a reinforcement learning-based agent.  ... 
arXiv:2003.01680v2 fatcat:qx3rhjadcbbtde3tzffsotwkte
« Previous Showing results 1 — 15 out of 92 results