Filters








9,901 Hits in 3.6 sec

Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks [article]

Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese
2019 arXiv   pre-print
Many robotic applications require the agent to perform long-horizon tasks in partially observable environments.  ...  In this work, we propose a novel memory-based policy, named Scene Memory Transformer (SMT).  ...  We thank Marek Fišer for the software development of the simulation environment, Oscar Ramirez and Ayzaan Wahid for the support of the learning infrastructure.  ... 
arXiv:1903.03878v1 fatcat:7lxd254ycbdn5ihen7swen5weq

Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion [article]

Alessandro Suglia, Qiaozi Gao, Jesse Thomason, Govind Thattai, Gaurav Sukhatme
2021 arXiv   pre-print
We present Embodied BERT (EmBERT), a transformer-based model which can attend to high-dimensional, multi-modal inputs across long temporal horizons for language-conditioned task completion.  ...  We achieve competitive performance on the ALFRED benchmark, and EmBERT marks the first transformer-based model to successfully handle the long-horizon, dense, multi-modal histories of ALFRED, and the first  ...  Embodied BERT EmBERT uses a transformer encoder for jointly embedding language and visual tokens and an transformer decoder for long-horizon planning and object-centric navigation predictions (Figure  ... 
arXiv:2108.04927v2 fatcat:pq6k7mbrsneuzaaio6egm54hqe

Learning to Act with Affordance-Aware Multimodal Neural SLAM [article]

Zhiwei Jia, Kaixiang Lin, Yizhou Zhao, Qiaozi Gao, Govind Thattai, Gaurav Sukhatme
2022 arXiv   pre-print
There are several challenges in solving embodied multimodal tasks, including long-horizon planning, vision-and-language grounding, and efficient exploration.  ...  Recent years have witnessed an emerging paradigm shift toward embodied artificial intelligence, in which an agent must learn to solve challenging tasks by interacting with its environment.  ...  AMSLAM is the first Neural SLAM-based approach for Embodied AI tasks to utilize several modalities for effective exploration and an affordance-aware semantic representation for robust long-horizon planning  ... 
arXiv:2201.09862v3 fatcat:6fkkxpagzjctnf72gnbnpuhi4a

MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation [article]

Saim Wani, Shivansh Patel, Unnat Jain, Angel X. Chang, Manolis Savva
2020 arXiv   pre-print
Recent work shows that map-like memory is useful for long-horizon navigation tasks.  ...  feature map agents; and iii) even oracle map agents achieve relatively low performance, indicating the potential for future work in training embodied navigation agents using maps.  ...  Long-horizon embodied agent tasks. There has been relatively little work on training embodied agents for long-horizon tasks. Mirowski et al.  ... 
arXiv:2012.03912v1 fatcat:sp4xwuezfnempc7hctjkaurjxe

SGoLAM: Simultaneous Goal Localization and Mapping for Multi-Object Goal Navigation [article]

Junho Kim, Eun Sun Lee, Mingi Lee, Donsu Zhang, Young Min Kim
2021 arXiv   pre-print
Given an agent equipped with an RGB-D camera and a GPS/Compass sensor, our objective is to have the agent navigate to a sequence of target objects in realistic 3D environments.  ...  As our approach does not require any training of neural networks, it could be used in an off-the-shelf manner, and amenable for fast generalization in new, unseen environments.  ...  Mapping in Navigation Long-horizon navigation task in a complex realistic environment is considered beyond the capability of existing memory-less systems.  ... 
arXiv:2110.07171v1 fatcat:i5goy2gqqvdl5b7kws3ohvraie

FILM: Following Instructions in Language with Modular Methods [article]

So Yeon Min, Devendra Singh Chaplot, Pradeep Ravikumar, Yonatan Bisk, Ruslan Salakhutdinov
2022 arXiv   pre-print
Our findings suggest that an explicit spatial memory and a semantic search policy can provide a stronger and more general representation for state-tracking and guidance, even in the absence of expert trajectories  ...  Such approaches assume that neural states will integrate multimodal semantics to perform state tracking, building spatial memory, exploration, and long-term planning.  ...  multimodal embodied agents that perform complex tasks.  ... 
arXiv:2110.07342v3 fatcat:gx2poe4ks5dh5l6sqpzqoi6nje

Memory-Augmented Reinforcement Learning for Image-Goal Navigation [article]

Lina Mezghani, Sainbayar Sukhbaatar, Thibaut Lavril, Oleksandr Maksymets, Dhruv Batra, Piotr Bojanowski, Karteek Alahari
2022 arXiv   pre-print
In this work, we present a memory-augmented approach for image-goal navigation.  ...  First, we train a state-embedding network in a self-supervised fashion, and then use it to embed previously-visited states into the agent's memory.  ...  The episodic memory is reset after each episode, while the long-term memory remains for 100 episodes in the same scene.  ... 
arXiv:2101.05181v4 fatcat:v5mva4caxndu7k77iro5slgnhu

End-to-End Egospheric Spatial Memory [article]

Daniel Lenton, Stephen James, Ronald Clark, Andrew J. Davison
2021 arXiv   pre-print
Spatial memory, or the ability to remember and recall specific locations and objects, is central to autonomous agents' ability to carry out tasks in real environments.  ...  We propose a parameter-free module, Egospheric Spatial Memory (ESM), which encodes the memory in an ego-sphere around the agent, enabling expressive 3D representations.  ...  When compared to Recurrent Neural Networks (RNNs), the persistent memory circumvents issues of vanishing or exploding gradients, enabling solutions to long-horizon tasks.  ... 
arXiv:2102.07764v2 fatcat:hvnai4ki2jfoba2znpio55ogia

Rearrangement: A Challenge for Embodied AI [article]

Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, Manolis Savva, Hao Su
2020 arXiv   pre-print
We describe a framework for research and evaluation in Embodied AI. Our proposal is based on a canonical task: Rearrangement.  ...  The goal state can be specified by object poses, by images, by a description in language, or by letting the agent experience the environment in the goal state.  ...  We thank Joseph Lim for participating in early discussions. We also thank the AI2-THOR, Habitat, RL-Bench, and SAPIEN teams for releasing the experimental testbeds described in this report.  ... 
arXiv:2011.01975v1 fatcat:a7olb6osxndmjlxkxlcr2xnolq

Offline Visual Representation Learning for Embodied Navigation [article]

Karmesh Yadav, Ram Ramrakhya, Arjun Majumdar, Vincent-Pierre Berges, Sachit Kuhar, Dhruv Batra, Alexei Baevski, Oleksandr Maksymets
2022 arXiv   pre-print
How should we learn visual representations for embodied agents that must see and move?  ...  of indoor environments (Omnidata), and (2) online finetuning of visuomotor representations on specific tasks with image augmentations under long learning schedules.  ...  In Section 5.5, we studied OVRL's properties while training on the HM3D dataset for a long time horizon (2 billion (B) simulation steps).  ... 
arXiv:2204.13226v1 fatcat:dgqwmbzzjfcdjjtmkezx6ycc4i

Learning to Explore using Active Neural SLAM [article]

Devendra Singh Chaplot, Dhiraj Gandhi, Saurabh Gupta, Abhinav Gupta, Ruslan Salakhutdinov
2020 arXiv   pre-print
This work presents a modular and hierarchical approach to learn policies for exploring 3D environments, called 'Active Neural SLAM'.  ...  The proposed model can also be easily transferred to the PointGoal task and was the winning entry of the CVPR 2019 Habitat PointGoal Navigation Challenge.  ...  We thank Guillaume Lample for discussions and coding during the initial stages of this project. Licenses for referenced datasets.  ... 
arXiv:2004.05155v1 fatcat:6t7hhvlocfa4pel6y2v46tmusm

Object Goal Navigation using Goal-Oriented Semantic Exploration [article]

Devendra Singh Chaplot, Dhiraj Gandhi, Abhinav Gupta, Ruslan Salakhutdinov
2020 arXiv   pre-print
End-to-end learning-based navigation methods struggle at this task as they are ineffective at exploration and long-term planning.  ...  Ablation analysis indicates that the proposed model learns semantic priors of the relative arrangement of objects in a scene, and uses them to explore efficiently.  ...  Acknowledgements This work was supported in part by the US Army W911NF1920104, IARPA D17PC00340, ONR Grant N000141812861, DARPA MCS and ONR Young Investigator.  ... 
arXiv:2007.00643v2 fatcat:nxmdhmx2ozeopbx5tsi4mhs6qy

SoundSpaces: Audio-Visual Navigation in 3D Environments [article]

Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman
2020 arXiv   pre-print
Our results show that audio greatly benefits embodied visual navigation in 3D spaces, and our work lays groundwork for new research in embodied AI with audio-visual perception.  ...  Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment.  ...  Acknowledgements UT Austin is supported in part by DARPA Lifelong Learning Machines.  ... 
arXiv:1912.11474v3 fatcat:vidyc3jrzzeofdnadv7t2xzi5q

SOON: Scenario Oriented Object Navigation with Graph-based Exploration [article]

Fengda Zhu, Xiwen Liang, Yi Zhu, Xiaojun Chang, Xiaodan Liang
2021 arXiv   pre-print
In this task, an agent is required to navigate from an arbitrary position in a 3D embodied environment to localize a target following a scene description.  ...  Accordingly, in this paper, we introduce a Scenario Oriented Object Navigation (SOON) task.  ...  Scene memory transformer for embodied uous control actions with position-visitation prediction. agents in long-horizon tasks.  ... 
arXiv:2103.17138v2 fatcat:24yoqxomlff5lbbgdvzboxk64a

Visual Room Rearrangement [article]

Luca Weihs, Matt Deitke, Aniruddha Kembhavi, Roozbeh Mottaghi
2021 arXiv   pre-print
In this paper, we propose a new dataset and baseline models for the task of Rearrangement.  ...  Our experiments show that solving this challenging interactive task that involves navigation and object interaction is beyond the capabilities of the current state-of-the-art techniques for embodied tasks  ...  To establish baseline performance for our task, we evaluate an actorcritic model akin to the state-of-the-art models used for long-horizon tasks such as navigation.  ... 
arXiv:2103.16544v1 fatcat:bqw7an7wvzdfxnyz5xzrg2xq2i
« Previous Showing results 1 — 15 out of 9,901 results