A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Hierarchical Neural Dynamic Policies
[article]
2021
arXiv
pre-print
We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input. The family of nonlinear dynamical system-based methods have successfully demonstrated dynamic robot behaviors but have difficulty in generalizing to unseen configurations as well as learning from image inputs. Recent works approach this issue by using deep network policies and reparameterize actions to embed the structure of dynamical systems but
arXiv:2107.05627v1
fatcat:4246ptcm5nf73glygr355w2gcu
more »
... still struggle in domains with diverse configurations of image goals, and hence, find it difficult to generalize. In this paper, we address this dichotomy by leveraging embedding the structure of dynamical systems in a hierarchical deep policy learning framework, called Hierarchical Neural Dynamical Policies (H-NDPs). Instead of fitting deep dynamical systems to diverse data directly, H-NDPs form a curriculum by learning local dynamical system-based policies on small regions in state-space and then distill them into a global dynamical system-based policy that operates only from high-dimensional images. H-NDPs additionally provide smooth trajectories, a strong safety benefit in the real world. We perform extensive experiments on dynamic tasks both in the real world (digit writing, scooping, and pouring) and simulation (catching, throwing, picking). We show that H-NDPs are easily integrated with both imitation as well as reinforcement learning setups and achieve state-of-the-art results. Video results are at https://shikharbahl.github.io/hierarchical-ndps/
Human-to-Robot Imitation in the Wild
[article]
2022
arXiv
pre-print
We approach the problem of learning by watching humans in the wild. While traditional approaches in Imitation and Reinforcement Learning are promising for learning in the real world, they are either sample inefficient or are constrained to lab settings. Meanwhile, there has been a lot of success in processing passive, unstructured human data. We propose tackling this problem via an efficient one-shot robot learning algorithm, centered around learning from a third-person perspective. We call our
arXiv:2207.09450v1
fatcat:wk7yrz4vkvefnajjwcojh3sdii
more »
... method WHIRL: In-the-Wild Human Imitating Robot Learning. WHIRL extracts a prior over the intent of the human demonstrator, using it to initialize our agent's policy. We introduce an efficient real-world policy learning scheme that improves using interactions. Our key contributions are a simple sampling-based policy optimization approach, a novel objective function for aligning human and robot videos as well as an exploration method to boost sample efficiency. We show one-shot generalization and success in real-world settings, including 20 different manipulation tasks in the wild. Videos and talk at https://human2robot.github.io
Eutrophication: Impact of Excess Nutrient Status in Lake Water Ecosystem
2012
Journal of Environmental & Analytical Toxicology
Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives
[article]
2021
arXiv
pre-print
Despite the potential of reinforcement learning (RL) for building general-purpose robotic systems, training RL agents to solve robotics tasks still remains challenging due to the difficulty of exploration in purely continuous action spaces. Addressing this problem is an active area of research with the majority of focus on improving RL methods via better optimization or more efficient exploration. An alternate but important component to consider improving is the interface of the RL algorithm
arXiv:2110.15360v1
fatcat:5b6eobkxp5dmdhwlygkqglfz2y
more »
... h the robot. In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy. These parameterized primitives are expressive, simple to implement, enable efficient exploration and can be transferred across robots, tasks and environments. We perform a thorough empirical study across challenging tasks in three distinct domains with image input and a sparse terminal reward. We find that our simple change to the action interface substantially improves both the learning efficiency and task performance irrespective of the underlying RL algorithm, significantly outperforming prior methods which learn skills from offline expert data. Code and videos at https://mihdalal.github.io/raps/
RMA: Rapid Motor Adaptation for Legged Robots
[article]
2021
arXiv
pre-print
Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. RMA consists of two components: a base policy and an adaptation module. The combination of these components enables the robot to adapt to novel situations in fractions of a second. RMA is trained
arXiv:2107.04034v1
fatcat:xi5zeqf7prea7ni4cxcwqq4kte
more »
... mpletely in simulation without using any domain knowledge like reference trajectories or predefined foot trajectory generators and is deployed on the A1 robot without any fine-tuning. We train RMA on a varied terrain generator using bioenergetics-inspired rewards and deploy it on a variety of difficult terrains including rocky, slippery, deformable surfaces in environments with grass, long vegetation, concrete, pebbles, stairs, sand, etc. RMA shows state-of-the-art performance across diverse real-world as well as simulation experiments. Video results at https://ashish-kmr.github.io/rma-legged-robots/
Unsupervised Learning of Visual 3D Keypoints for Control
[article]
2021
arXiv
pre-print
Correspondence to: Deepak Pathak <dpathak@cs.cmu.edu>. ...
., 2016; Pathak et al., 2017; Laskin et al., 2020b) . Although such representation We propose an end-to-end framework for unsupervised learning of 3D keypoints from multi-view images. ...
arXiv:2106.07643v1
fatcat:5vmjphexcbeydjixga245sr764
Constrained Structured Regression with Convolutional Neural Networks
[article]
2015
arXiv
pre-print
Convolutional Neural Networks (CNNs) have recently emerged as the dominant model in computer vision. If provided with enough training data, they predict almost any visual quantity. In a discrete setting, such as classification, CNNs are not only able to predict a label but often predict a confidence in the form of a probability distribution over the output space. In continuous regression tasks, such a probability estimate is often lacking. We present a regression framework which models the
arXiv:1511.07497v1
fatcat:jefhwe2hrngixpw5qy63wv3fiq
more »
... t distribution of neural networks. This output distribution allows us to infer the most likely labeling following a set of physical or modeling constraints. These constraints capture the intricate interplay between different input and output variables, and complement the output of a CNN. However, they may not hold everywhere. Our setup further allows to learn a confidence with which a constraint holds, in the form of a distribution of the constrain satisfaction. We evaluate our approach on the problem of intrinsic image decomposition, and show that constrained structured regression significantly increases the state-of-the-art.
Functional Regularization for Reinforcement Learning via Learned Fourier Features
[article]
2021
arXiv
pre-print
We propose a simple architecture for deep reinforcement learning by embedding inputs into a learned Fourier basis and show that it improves the sample efficiency of both state-based and image-based RL. We perform infinite-width analysis of our architecture using the Neural Tangent Kernel and theoretically show that tuning the initial variance of the Fourier basis is equivalent to functional regularization of the learned deep network. That is, these learned Fourier features allow for adjusting
arXiv:2112.03257v1
fatcat:34ecc3alevdrdjjcewclbighjm
more »
... e degree to which networks underfit or overfit different frequencies in the training data, and hence provide a controlled mechanism to improve the stability and performance of RL optimization. Empirically, this allows us to prioritize learning low-frequency functions and speed up learning by reducing networks' susceptibility to noise in the optimization process, such as during Bellman updates. Experiments on standard state-based and image-based RL benchmarks show clear benefits of our architecture over the baselines. Website at https://alexanderli.com/learned-fourier-features
Curiosity-driven Exploration by Self-supervised Prediction
[article]
2017
arXiv
pre-print
Correspondence to: Deepak Pathak <pathak@berkeley.edu>. ...
., 2015; Jayaraman & Grauman, 2015; Pathak et al., 2016; Wang & Gupta, 2015) . ...
arXiv:1705.05363v1
fatcat:bz6kt646wfhila4wybjnpa3lbm
Learning Instance Segmentation by Interaction
[article]
2018
arXiv
pre-print
Note that our formulation could be thought of as a generalization of the CCNN constrained formulation proposed in Pathak et. al. ...
arXiv:1806.08354v1
fatcat:nreqyuzgr5c7fpfj4xxnmzuszq
Generating Fast and Slow: Scene Decomposition via Reconstruction
[article]
2022
arXiv
pre-print
We consider the problem of segmenting scenes into constituent entities, i.e. underlying objects and their parts. Current supervised visual detectors though impressive within their training distribution, often fail to segment out-of-distribution scenes into their constituent entities. Recent slot-centric generative models break such dependence on supervision, by attempting to segment scenes into entities unsupervised, by reconstructing pixels. However, they have been restricted thus far to toy
arXiv:2203.11194v1
fatcat:f4yrvw7wqjahva6fdj23ytieoy
more »
... enes as they suffer from a reconstruction-segmentation trade-off: as the entity bottleneck gets wider, reconstruction improves but then the segmentation collapses. We propose GFS-Nets (Generating Fast and Slow Networks) that alleviate this issue with two ingredients: i) curriculum training in the form of primitives, often missing from current generative models and, ii) test-time adaptation per scene through gradient descent on the reconstruction objective, what we call slow inference, missing from current feed-forward detectors. We show the proposed curriculum suffices to break the reconstruction-segmentation trade-off, and slow inference greatly improves segmentation in out-of-distribution scenes. We evaluate GFS-Nets in 3D and 2D scene segmentation benchmarks of PartNet, CLEVR, Room Diverse++, and show large ( 50%) performance improvements against SOTA supervised feed-forward detectors and unsupervised object discovery methods
Discovering and Achieving Goals via World Models
[article]
2021
arXiv
pre-print
Acknowledgements We thank Ben Eysenbach, Stephen Tian, Sergey Levine, Dinesh Jayaraman, Karl Pertsch, Ed Hu and the members of GRASP lab and Pathak lab for insightful discussions. ...
arXiv:2110.09514v1
fatcat:tdyd2jkixjaqzebetyi5qakeoa
Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller
[article]
2019
arXiv
pre-print
We study a generalized setup for learning from demonstration to build an agent that can manipulate novel objects in unseen scenarios by looking at only a single video of human demonstration from a third-person perspective. To accomplish this goal, our agent should not only learn to understand the intent of the demonstrated third-person video in its context but also perform the intended task in its environment configuration. Our central insight is to enforce this structure explicitly during
arXiv:1911.09676v1
fatcat:fxjucsaijfdjpa3kcfom3h74ve
more »
... ing by decoupling what to achieve (intended task) from how to perform it (controller). We propose a hierarchical setup where a high-level module learns to generate a series of first-person sub-goals conditioned on the third-person video demonstration, and a low-level controller predicts the actions to achieve those sub-goals. Our agent acts from raw image observations without any access to the full state information. We show results on a real robotic platform using Baxter for the manipulation tasks of pouring and placing objects in a box. Project video and code are at https://pathak22.github.io/hierarchical-imitation/
Adapting Rapid Motor Adaptation for Bipedal Robots
[article]
2022
arXiv
pre-print
Recent advances in legged locomotion have enabled quadrupeds to walk on challenging terrains. However, bipedal robots are inherently more unstable and hence it's harder to design walking controllers for them. In this work, we leverage recent advances in rapid adaptation for locomotion control, and extend them to work on bipedal robots. Similar to existing works, we start with a base policy which produces actions while taking as input an estimated extrinsics vector from an adaptation module.
arXiv:2205.15299v1
fatcat:l4phq4tcxvcfhkzu37muj7exvq
more »
... extrinsics vector contains information about the environment and enables the walking controller to rapidly adapt online. However, the extrinsics estimator could be imperfect, which might lead to poor performance of the base policy which expects a perfect estimator. In this paper, we propose A-RMA (Adapting RMA), which additionally adapts the base policy for the imperfect extrinsics estimator by finetuning it using model-free RL. We demonstrate that A-RMA outperforms a number of RL-based baseline controllers and model-based controllers in simulation, and show zero-shot deployment of a single A-RMA policy to enable a bipedal robot, Cassie, to walk in a variety of different scenarios in the real world beyond what it has seen during training. Videos and results at https://ashish-kmr.github.io/a-rma/
Large-Scale Study of Curiosity-Driven Learning
[article]
2018
arXiv
pre-print
In particular, we choose the dynamics-based curiosity model of intrinsic reward presented in Pathak et al. ...
[42] where they use autoencoder features, and Pathak et al. [27] where they use features trained with an inverse dynamics task. ...
arXiv:1808.04355v1
fatcat:nocnlbafbfcalhqxymd7tjgqfa
« Previous
Showing results 1 — 15 out of 598 results