598 Hits in 1.3 sec

Hierarchical Neural Dynamic Policies [article]

Shikhar Bahl, Abhinav Gupta, Deepak Pathak
2021 arXiv   pre-print
We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input. The family of nonlinear dynamical system-based methods have successfully demonstrated dynamic robot behaviors but have difficulty in generalizing to unseen configurations as well as learning from image inputs. Recent works approach this issue by using deep network policies and reparameterize actions to embed the structure of dynamical systems but
more » ... still struggle in domains with diverse configurations of image goals, and hence, find it difficult to generalize. In this paper, we address this dichotomy by leveraging embedding the structure of dynamical systems in a hierarchical deep policy learning framework, called Hierarchical Neural Dynamical Policies (H-NDPs). Instead of fitting deep dynamical systems to diverse data directly, H-NDPs form a curriculum by learning local dynamical system-based policies on small regions in state-space and then distill them into a global dynamical system-based policy that operates only from high-dimensional images. H-NDPs additionally provide smooth trajectories, a strong safety benefit in the real world. We perform extensive experiments on dynamic tasks both in the real world (digit writing, scooping, and pouring) and simulation (catching, throwing, picking). We show that H-NDPs are easily integrated with both imitation as well as reinforcement learning setups and achieve state-of-the-art results. Video results are at
arXiv:2107.05627v1 fatcat:4246ptcm5nf73glygr355w2gcu

Human-to-Robot Imitation in the Wild [article]

Shikhar Bahl, Abhinav Gupta, Deepak Pathak
2022 arXiv   pre-print
We approach the problem of learning by watching humans in the wild. While traditional approaches in Imitation and Reinforcement Learning are promising for learning in the real world, they are either sample inefficient or are constrained to lab settings. Meanwhile, there has been a lot of success in processing passive, unstructured human data. We propose tackling this problem via an efficient one-shot robot learning algorithm, centered around learning from a third-person perspective. We call our
more » ... method WHIRL: In-the-Wild Human Imitating Robot Learning. WHIRL extracts a prior over the intent of the human demonstrator, using it to initialize our agent's policy. We introduce an efficient real-world policy learning scheme that improves using interactions. Our key contributions are a simple sampling-based policy optimization approach, a novel objective function for aligning human and robot videos as well as an exploration method to boost sample efficiency. We show one-shot generalization and success in real-world settings, including 20 different manipulation tasks in the wild. Videos and talk at
arXiv:2207.09450v1 fatcat:wk7yrz4vkvefnajjwcojh3sdii

Eutrophication: Impact of Excess Nutrient Status in Lake Water Ecosystem

Hemant Pathak, Deepak Pathak
2012 Journal of Environmental & Analytical Toxicology  
doi:10.4172/2161-0525.1000148 fatcat:hqtwrj55wvbt3pgxtnat4pyfcy

Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives [article]

Murtaza Dalal, Deepak Pathak, Ruslan Salakhutdinov
2021 arXiv   pre-print
Despite the potential of reinforcement learning (RL) for building general-purpose robotic systems, training RL agents to solve robotics tasks still remains challenging due to the difficulty of exploration in purely continuous action spaces. Addressing this problem is an active area of research with the majority of focus on improving RL methods via better optimization or more efficient exploration. An alternate but important component to consider improving is the interface of the RL algorithm
more » ... h the robot. In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy. These parameterized primitives are expressive, simple to implement, enable efficient exploration and can be transferred across robots, tasks and environments. We perform a thorough empirical study across challenging tasks in three distinct domains with image input and a sparse terminal reward. We find that our simple change to the action interface substantially improves both the learning efficiency and task performance irrespective of the underlying RL algorithm, significantly outperforming prior methods which learn skills from offline expert data. Code and videos at
arXiv:2110.15360v1 fatcat:5b6eobkxp5dmdhwlygkqglfz2y

RMA: Rapid Motor Adaptation for Legged Robots [article]

Ashish Kumar, Zipeng Fu, Deepak Pathak, Jitendra Malik
2021 arXiv   pre-print
Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. RMA consists of two components: a base policy and an adaptation module. The combination of these components enables the robot to adapt to novel situations in fractions of a second. RMA is trained
more » ... mpletely in simulation without using any domain knowledge like reference trajectories or predefined foot trajectory generators and is deployed on the A1 robot without any fine-tuning. We train RMA on a varied terrain generator using bioenergetics-inspired rewards and deploy it on a variety of difficult terrains including rocky, slippery, deformable surfaces in environments with grass, long vegetation, concrete, pebbles, stairs, sand, etc. RMA shows state-of-the-art performance across diverse real-world as well as simulation experiments. Video results at
arXiv:2107.04034v1 fatcat:xi5zeqf7prea7ni4cxcwqq4kte

Unsupervised Learning of Visual 3D Keypoints for Control [article]

Boyuan Chen, Pieter Abbeel, Deepak Pathak
2021 arXiv   pre-print
Correspondence to: Deepak Pathak <>.  ...  ., 2016; Pathak et al., 2017; Laskin et al., 2020b) . Although such representation We propose an end-to-end framework for unsupervised learning of 3D keypoints from multi-view images.  ... 
arXiv:2106.07643v1 fatcat:5vmjphexcbeydjixga245sr764

Constrained Structured Regression with Convolutional Neural Networks [article]

Deepak Pathak, Philipp Krähenbühl, Stella X. Yu, Trevor Darrell
2015 arXiv   pre-print
Convolutional Neural Networks (CNNs) have recently emerged as the dominant model in computer vision. If provided with enough training data, they predict almost any visual quantity. In a discrete setting, such as classification, CNNs are not only able to predict a label but often predict a confidence in the form of a probability distribution over the output space. In continuous regression tasks, such a probability estimate is often lacking. We present a regression framework which models the
more » ... t distribution of neural networks. This output distribution allows us to infer the most likely labeling following a set of physical or modeling constraints. These constraints capture the intricate interplay between different input and output variables, and complement the output of a CNN. However, they may not hold everywhere. Our setup further allows to learn a confidence with which a constraint holds, in the form of a distribution of the constrain satisfaction. We evaluate our approach on the problem of intrinsic image decomposition, and show that constrained structured regression significantly increases the state-of-the-art.
arXiv:1511.07497v1 fatcat:jefhwe2hrngixpw5qy63wv3fiq

Functional Regularization for Reinforcement Learning via Learned Fourier Features [article]

Alexander C. Li, Deepak Pathak
2021 arXiv   pre-print
We propose a simple architecture for deep reinforcement learning by embedding inputs into a learned Fourier basis and show that it improves the sample efficiency of both state-based and image-based RL. We perform infinite-width analysis of our architecture using the Neural Tangent Kernel and theoretically show that tuning the initial variance of the Fourier basis is equivalent to functional regularization of the learned deep network. That is, these learned Fourier features allow for adjusting
more » ... e degree to which networks underfit or overfit different frequencies in the training data, and hence provide a controlled mechanism to improve the stability and performance of RL optimization. Empirically, this allows us to prioritize learning low-frequency functions and speed up learning by reducing networks' susceptibility to noise in the optimization process, such as during Bellman updates. Experiments on standard state-based and image-based RL benchmarks show clear benefits of our architecture over the baselines. Website at
arXiv:2112.03257v1 fatcat:34ecc3alevdrdjjcewclbighjm

Curiosity-driven Exploration by Self-supervised Prediction [article]

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell
2017 arXiv   pre-print
Correspondence to: Deepak Pathak <>.  ...  ., 2015; Jayaraman & Grauman, 2015; Pathak et al., 2016; Wang & Gupta, 2015) .  ... 
arXiv:1705.05363v1 fatcat:bz6kt646wfhila4wybjnpa3lbm

Learning Instance Segmentation by Interaction [article]

Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik
2018 arXiv   pre-print
Note that our formulation could be thought of as a generalization of the CCNN constrained formulation proposed in Pathak et. al.  ... 
arXiv:1806.08354v1 fatcat:nreqyuzgr5c7fpfj4xxnmzuszq

Generating Fast and Slow: Scene Decomposition via Reconstruction [article]

Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina Fragkiadaki
2022 arXiv   pre-print
We consider the problem of segmenting scenes into constituent entities, i.e. underlying objects and their parts. Current supervised visual detectors though impressive within their training distribution, often fail to segment out-of-distribution scenes into their constituent entities. Recent slot-centric generative models break such dependence on supervision, by attempting to segment scenes into entities unsupervised, by reconstructing pixels. However, they have been restricted thus far to toy
more » ... enes as they suffer from a reconstruction-segmentation trade-off: as the entity bottleneck gets wider, reconstruction improves but then the segmentation collapses. We propose GFS-Nets (Generating Fast and Slow Networks) that alleviate this issue with two ingredients: i) curriculum training in the form of primitives, often missing from current generative models and, ii) test-time adaptation per scene through gradient descent on the reconstruction objective, what we call slow inference, missing from current feed-forward detectors. We show the proposed curriculum suffices to break the reconstruction-segmentation trade-off, and slow inference greatly improves segmentation in out-of-distribution scenes. We evaluate GFS-Nets in 3D and 2D scene segmentation benchmarks of PartNet, CLEVR, Room Diverse++, and show large ( 50%) performance improvements against SOTA supervised feed-forward detectors and unsupervised object discovery methods
arXiv:2203.11194v1 fatcat:f4yrvw7wqjahva6fdj23ytieoy

Discovering and Achieving Goals via World Models [article]

Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, Deepak Pathak
2021 arXiv   pre-print
Acknowledgements We thank Ben Eysenbach, Stephen Tian, Sergey Levine, Dinesh Jayaraman, Karl Pertsch, Ed Hu and the members of GRASP lab and Pathak lab for insightful discussions.  ... 
arXiv:2110.09514v1 fatcat:tdyd2jkixjaqzebetyi5qakeoa

Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller [article]

Pratyusha Sharma, Deepak Pathak, Abhinav Gupta
2019 arXiv   pre-print
We study a generalized setup for learning from demonstration to build an agent that can manipulate novel objects in unseen scenarios by looking at only a single video of human demonstration from a third-person perspective. To accomplish this goal, our agent should not only learn to understand the intent of the demonstrated third-person video in its context but also perform the intended task in its environment configuration. Our central insight is to enforce this structure explicitly during
more » ... ing by decoupling what to achieve (intended task) from how to perform it (controller). We propose a hierarchical setup where a high-level module learns to generate a series of first-person sub-goals conditioned on the third-person video demonstration, and a low-level controller predicts the actions to achieve those sub-goals. Our agent acts from raw image observations without any access to the full state information. We show results on a real robotic platform using Baxter for the manipulation tasks of pouring and placing objects in a box. Project video and code are at
arXiv:1911.09676v1 fatcat:fxjucsaijfdjpa3kcfom3h74ve

Adapting Rapid Motor Adaptation for Bipedal Robots [article]

Ashish Kumar, Zhongyu Li, Jun Zeng, Deepak Pathak, Koushil Sreenath, Jitendra Malik
2022 arXiv   pre-print
Recent advances in legged locomotion have enabled quadrupeds to walk on challenging terrains. However, bipedal robots are inherently more unstable and hence it's harder to design walking controllers for them. In this work, we leverage recent advances in rapid adaptation for locomotion control, and extend them to work on bipedal robots. Similar to existing works, we start with a base policy which produces actions while taking as input an estimated extrinsics vector from an adaptation module.
more » ... extrinsics vector contains information about the environment and enables the walking controller to rapidly adapt online. However, the extrinsics estimator could be imperfect, which might lead to poor performance of the base policy which expects a perfect estimator. In this paper, we propose A-RMA (Adapting RMA), which additionally adapts the base policy for the imperfect extrinsics estimator by finetuning it using model-free RL. We demonstrate that A-RMA outperforms a number of RL-based baseline controllers and model-based controllers in simulation, and show zero-shot deployment of a single A-RMA policy to enable a bipedal robot, Cassie, to walk in a variety of different scenarios in the real world beyond what it has seen during training. Videos and results at
arXiv:2205.15299v1 fatcat:l4phq4tcxvcfhkzu37muj7exvq

Large-Scale Study of Curiosity-Driven Learning [article]

Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros
2018 arXiv   pre-print
In particular, we choose the dynamics-based curiosity model of intrinsic reward presented in Pathak et al.  ...  [42] where they use autoencoder features, and Pathak et al. [27] where they use features trained with an inverse dynamics task.  ... 
arXiv:1808.04355v1 fatcat:nocnlbafbfcalhqxymd7tjgqfa
« Previous Showing results 1 — 15 out of 598 results