5,693 Hits in 5.4 sec

Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks [article]

Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese
2019 arXiv   pre-print
On a range of visual navigation tasks, SMT demonstrates superior performance to existing reactive and memory-based policies by a margin.  ...  Many robotic applications require the agent to perform long-horizon tasks in partially observable environments.  ...  We thank Marek Fišer for the software development of the simulation environment, Oscar Ramirez and Ayzaan Wahid for the support of the learning infrastructure.  ... 
arXiv:1903.03878v1 fatcat:7lxd254ycbdn5ihen7swen5weq

Skill-based Meta-Reinforcement Learning [article]

Taewook Nam, Shao-Hua Sun, Karl Pertsch, Sung Ju Hwang, Joseph J Lim
2022 arXiv   pre-print
Specifically, we propose to (1) extract reusable skills and a skill prior from offline datasets, (2) meta-train a high-level policy that learns to efficiently compose learned skills into long-horizon behaviors  ...  While deep reinforcement learning methods have shown impressive results in robot learning, their sample inefficiency makes the learning of complex, long-horizon behaviors with real robot systems infeasible  ...  We employed 4-layer MLPs with 256 hidden units for Maze Navigation, and 6-layer MLPs with 128 hidden unit for Kitchen Manipulation experiment.  ... 
arXiv:2204.11828v1 fatcat:rfdkkgesorb4liil44qb53kpba

Planning with Goal-Conditioned Policies [article]

Soroush Nasiriany, Vitchyr H. Pong, Steven Lin, Sergey Levine
2019 arXiv   pre-print
Planning methods can solve temporally extended sequential decision making problems by composing simple behaviors.  ...  Can we utilize reinforcement learning to automatically form the abstractions needed for planning, thus obtaining the best of both approaches?  ...  B.3 Ant Navigation The ant must learn to navigate around a narrow rectangular room with a long wall in the center. See Figure 5 for a visualization of the environment.  ... 
arXiv:1911.08453v1 fatcat:as5xycjatna43fqd3aszcugczi

One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones [article]

Chan Hee Song, Jihyung Kil, Tai-Yu Pan, Brian M. Sadler, Wei-Lun Chao, Yu Su
2022 arXiv   pre-print
Significant progress has been made in recent years, especially for tasks with short horizons.  ...  However, when it comes to long-horizon tasks with extended sequences of actions, an agent can easily ignore some instructions or get stuck in the middle of the long instructions and eventually fail the  ...  Acknowledgement The authors would like to thank the colleagues from the OSU NLP group for their thoughtful comments. This re-  ... 
arXiv:2202.07028v3 fatcat:h3dqjmb6e5anlnml3pabs3jy3e

Skill-based Model-based Reinforcement Learning [article]

Lucy Xiaoyang Shi and Joseph J. Lim and Youngwoon Lee
2022 arXiv   pre-print
We then harness the learned skill dynamics model to accurately simulate and plan over long horizons in the skill space, which enables efficient downstream learning of long-horizon, sparse reward tasks.  ...  However, planning every action for long-horizon tasks is not practical, akin to a human planning out every muscle movement.  ...  We would like to thank Ayush Jain and Grace Zhang for help on writing, Karl Pertsch for assistance in setting up SPiRL and CALVIN, and all members of the USC CLVR lab for constructive feedback.  ... 
arXiv:2207.07560v1 fatcat:6xhd7mlforfzhpamlqnd6aa2ky

LISA: Learning Interpretable Skill Abstractions from Language [article]

Divyansh Garg, Skanda Vaidyanath, Kuno Kim, Jiaming Song, Stefano Ermon
2022 arXiv   pre-print
In navigation and robotic manipulation environments, LISA is able to outperform a strong non-hierarchical baseline in the low data regime and compose learned skills to solve tasks containing unseen long-range  ...  LISA uses vector quantization to learn discrete skill codes that are highly correlated with language instructions and the behavior of the learned policy.  ...  , into discrete interpretable and composable codes (see Fig.5and Fig.7for visualizations).  ... 
arXiv:2203.00054v1 fatcat:rf5opxboprfk3kr6gfho3xaf3q

Learning to Drive Off Road on Smooth Terrain in Unstructured Environments Using an On-Board Camera and Sparse Aerial Images [article]

Travis Manderson, Stefan Wapnick, David Meger, Gregory Dudek
2020 arXiv   pre-print
We present a method for learning to drive on smooth terrain while simultaneously avoiding collisions in challenging off-road and unstructured outdoor environments using only visual inputs.  ...  We find that the fusion of these complementary inputs improves planning foresight and makes the model robust to visual obstructions.  ...  Imitation Learning for Visual Navigation Learning vision-based controllers for autonomous driving has been studied for several decades. Pomerleau et al.  ... 
arXiv:2004.04697v1 fatcat:vpzlpsl2czcwvilh3z3blirore

Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion [article]

Alessandro Suglia, Qiaozi Gao, Jesse Thomason, Govind Thattai, Gaurav Sukhatme
2021 arXiv   pre-print
navigation targets for EmBERT training.  ...  We present Embodied BERT (EmBERT), a transformer-based model which can attend to high-dimensional, multi-modal inputs across long temporal horizons for language-conditioned task completion.  ...  Embodied BERT EmBERT uses a transformer encoder for jointly embedding language and visual tokens and an transformer decoder for long-horizon planning and object-centric navigation predictions (Figure  ... 
arXiv:2108.04927v2 fatcat:pq6k7mbrsneuzaaio6egm54hqe

NaviGAN: A Generative Approach for Socially Compliant Navigation [article]

Chieh-En Tsai, Jean Oh
2020 arXiv   pre-print
For instance, the reinforcement learning approaches tend to optimize on the comfort aspect of the socially compliant navigation, whereas the inverse reinforcement learning approaches are designed to achieve  ...  Our approach is designed as an adversarial training framework that can learn to generate a navigation path that is both optimized for achieving a goal and for complying with latent social rules.  ...  To encourage a model to learn long-term, complex social behavior, we follow the common practice [24] , [26] of selectively choosing the pedestrians that stay in the receptive field for longer than T  ... 
arXiv:2007.05616v1 fatcat:3ziozipzpbeovcmkkc5cabgkoi

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings [article]

John D. Co-Reyes, YuXuan Liu, Abhishek Gupta, Benjamin Eysenbach, Pieter Abbeel, Sergey Levine
2018 arXiv   pre-print
We show that our model is effective at reasoning over long horizons with sparse rewards for several simulated tasks, outperforming standard reinforcement learning methods and prior methods for hierarchical  ...  This model provides a built-in prediction mechanism, by predicting the outcome of closed loop policy behavior.  ...  Acknowledgements We would like to thank Roberto Calandra, Gregory Kahn, Justin Fu for helpful comments and discussions.  ... 
arXiv:1806.02813v1 fatcat:3zznottwerd2xgse4iqznyqrxq

MPR-RL: Multi-Prior Regularized Reinforcement Learning for Knowledge Transfer

Quantao Yang, Johannes A. Stork, Todor Stoyanov
2022 IEEE Robotics and Automation Letters  
A recent work [27] proposes a bottom-up approach to learning a set of reusable skills from multi-task, multi-sensory demonstrations and use these skills to synthesize long-horizon robot behaviors.  ...  This skill prior based RL performs more efficiently over long-horizon tasks, but still requires many interactions to learn a new task.  ... 
doi:10.1109/lra.2022.3184805 fatcat:uqiog76p5bablpqcjd4qf6cwq4

CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks [article]

Oier Mees, Lukas Hermann, Erick Rosete-Beas, Wolfram Burgard
2022 arXiv   pre-print
In this paper, we present CALVIN (Composing Actions from Language and Vision), an open-source simulated benchmark to learn long-horizon language-conditioned tasks.  ...  Moreover, they need to acquire a diverse repertoire of general-purpose skills that allow composing long-horizon tasks by following unconstrained language instructions.  ...  We thank Corey Lynch and Pierre Sermanet for help with the MCIL baseline.  ... 
arXiv:2112.03227v4 fatcat:fi6cbb4zvbcb5kuiuishhrx55e

Hierarchical Few-Shot Imitation with Skill Transition Models [article]

Kourosh Hakhamaneshi, Ruihan Zhao, Albert Zhan, Pieter Abbeel, Michael Laskin
2022 arXiv   pre-print
Recent advances in data-driven skill learning have shown that extracting behavioral priors from offline data can enable agents to solve challenging long-horizon tasks with reinforcement learning.  ...  FIST learns an inverse skill dynamics model, a distance function, and utilizes a semi-parametric approach for imitation.  ...  Recently, learning data-driven behavioral priors has become a promising approach to solving long-horizon tasks.  ... 
arXiv:2107.08981v2 fatcat:7skfvf6xonfsdipychnzfgbflq

Universal Planning Networks [article]

Aravind Srinivas, Allan Jabri, Pieter Abbeel, Sergey Levine, Chelsea Finn
2018 arXiv   pre-print
We find that the representations learned are not only effective for goal-directed visual imitation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images  ...  The learned representations can be leveraged to specify distance-based rewards to reach new target states for model-free reinforcement learning, resulting in substantially more effective learning when  ...  Table 7 contains the hyperparameters for the long-horizon ant navigation experiment. Table 9 contains specific details about the environments, such as horizon and discount factor. F.  ... 
arXiv:1804.00645v2 fatcat:lh2f2utiujfrnckabtzyp6m3ke

Sparse Graphical Memory for Robust Planning [article]

Scott Emmons, Ajay Jain, Michael Laskin, Thanard Kurutach, Pieter Abbeel, Deepak Pathak
2020 arXiv   pre-print
Experimentally, we show that SGM significantly outperforms current state of the art methods on long horizon, sparse-reward visual navigation tasks.  ...  Current deep reinforcement and imitation learning methods can learn directly from high-dimensional inputs but do not scale well to long-horizon tasks.  ...  For the SSL experiments in ViZDoom, we use the trained, behavior cloned visual controller from SPTM.  ... 
arXiv:2003.06417v3 fatcat:guy5tpcv5bhd7o3gga5dwipybm
« Previous Showing results 1 — 15 out of 5,693 results