Filters








1,095 Hits in 1.2 sec

Mid-level Elements for Object Detection [article]

Aayush Bansal, Abhinav Shrivastava, Carl Doersch, Abhinav Gupta
2015 arXiv   pre-print
Building on the success of recent discriminative mid-level elements, we propose a surprisingly simple approach for object detection which performs comparable to the current state-of-the-art approaches on PASCAL VOC comp-3 detection challenge (no external data). Through extensive experiments and ablation analysis, we show how our approach effectively improves upon the HOG-based pipelines by adding an intermediate mid-level representation for the task of object detection. This representation is
more » ... sily interpretable and allows us to visualize what our object detector "sees". We also discuss the insights our approach shares with CNN-based methods, such as sharing representation between categories helps.
arXiv:1504.07284v1 fatcat:wvfquexhbngjhentuace2pgo24

Learning Exploration Policies for Navigation [article]

Tao Chen, Saurabh Gupta, Abhinav Gupta
2019 arXiv   pre-print
Numerous past works have tackled the problem of task-driven navigation. But, how to effectively explore a new environment to enable a variety of down-stream tasks has received much less attention. In this work, we study how agents can autonomously explore realistic and complex 3D environments without the context of task-rewards. We propose a learning-based approach and investigate different policy architectures, reward functions, and training paradigms. We find that the use of policies with
more » ... ial memory that are bootstrapped with imitation learning and finally finetuned with coverage rewards derived purely from on-board sensors can be effective at exploring novel environments. We show that our learned exploration policies can explore better than classical approaches based on geometry alone and generic learning-based exploration techniques. Finally, we also show how such task-agnostic exploration can be used for down-stream tasks. Code and Videos are available at: https://sites.google.com/view/exploration-for-nav.
arXiv:1903.01959v1 fatcat:2wudy6pgonhdpgfy6u4vjprryi

Cross-stitch Networks for Multi-task Learning [article]

Ishan Misra and Abhinav Shrivastava and Abhinav Gupta and Martial Hebert
2016 arXiv   pre-print
Multi-task learning in Convolutional Networks has displayed remarkable success in the field of recognition. This success can be largely attributed to learning shared representations from multiple supervisory tasks. However, existing multi-task approaches rely on enumerating multiple network architectures specific to the tasks at hand, that do not generalize. In this paper, we propose a principled approach to learn shared representations in ConvNets using multi-task learning. Specifically, we
more » ... pose a new sharing unit: "cross-stitch" unit. These units combine the activations from multiple networks and can be trained end-to-end. A network with cross-stitch units can learn an optimal combination of shared and task-specific representations. Our proposed method generalizes across multiple tasks and shows dramatically improved performance over baseline methods for categories with few training examples.
arXiv:1604.03539v1 fatcat:qjhwexuju5fhjg4anqoj7tmmgu

Training Region-based Object Detectors with Online Hard Example Mining [article]

Abhinav Shrivastava, Abhinav Gupta, Ross Girshick
2016 arXiv   pre-print
The field of object detection has made significant advances riding on the wave of region-based ConvNets, but their training procedure still includes many heuristics and hyperparameters that are costly to tune. We present a simple yet surprisingly effective online hard example mining (OHEM) algorithm for training region-based ConvNet detectors. Our motivation is the same as it has always been -- detection datasets contain an overwhelming number of easy examples and a small number of hard
more » ... . Automatic selection of these hard examples can make training more effective and efficient. OHEM is a simple and intuitive algorithm that eliminates several heuristics and hyperparameters in common use. But more importantly, it yields consistent and significant boosts in detection performance on benchmarks like PASCAL VOC 2007 and 2012. Its effectiveness increases as datasets become larger and more difficult, as demonstrated by the results on the MS COCO dataset. Moreover, combined with complementary advances in the field, OHEM leads to state-of-the-art results of 78.9% and 76.3% mAP on PASCAL VOC 2007 and 2012 respectively.
arXiv:1604.03540v1 fatcat:4hwaarxafrgfddixr72eoasw7a

Aligning Videos in Space and Time [article]

Senthil Purushwalkam, Tian Ye, Saurabh Gupta, Abhinav Gupta
2020 arXiv   pre-print
In this paper, we focus on the task of extracting visual correspondences across videos. Given a query video clip from an action class, we aim to align it with training videos in space and time. Obtaining training data for such a fine-grained alignment task is challenging and often ambiguous. Hence, we propose a novel alignment procedure that learns such correspondence in space and time via cross video cycle-consistency. During training, given a pair of videos, we compute cycles that connect
more » ... hes in a given frame in the first video by matching through frames in the second video. Cycles that connect overlapping patches together are encouraged to score higher than cycles that connect non-overlapping patches. Our experiments on the Penn Action and Pouring datasets demonstrate that the proposed method can successfully learn to correspond semantically similar patches across videos, and learns representations that are sensitive to object and action states.
arXiv:2007.04515v1 fatcat:6tdjecdhyrfzrgcblhsbydn4iy

Intrinsic Motivation for Encouraging Synergistic Behavior [article]

Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta
2020 arXiv   pre-print
We study the role of intrinsic motivation as an exploration bias for reinforcement learning in sparse-reward synergistic tasks, which are tasks where multiple agents must work together to achieve a goal they could not individually. Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own. Thus, we propose to incentivize agents to take (joint)
more » ... ons whose effects cannot be predicted via a composition of the predicted effect for each individual agent. We study two instantiations of this idea, one based on the true states encountered, and another based on a dynamics model trained concurrently with the policy. While the former is simpler, the latter has the benefit of being analytically differentiable with respect to the action taken. We validate our approach in robotic bimanual manipulation and multi-agent locomotion tasks with sparse rewards; we find that our approach yields more efficient learning than both 1) training with only the sparse reward and 2) using the typical surprise-based formulation of intrinsic motivation, which does not bias toward synergistic behavior. Videos are available on the project webpage: https://sites.google.com/view/iclr2020-synergistic.
arXiv:2002.05189v1 fatcat:avshg2ghlbemljqv2kzieeseiy

Semantic Curiosity for Active Visual Learning [article]

Devendra Singh Chaplot, Helen Jiang, Saurabh Gupta, Abhinav Gupta
2020 arXiv   pre-print
In this paper, we study the task of embodied interactive learning for object detection. Given a set of environments (and some labeling budget), our goal is to learn an object detector by having an agent select what data to obtain labels for. How should an exploration policy decide which trajectory should be labeled? One possibility is to use a trained object detector's failure cases as an external reward. However, this will require labeling millions of frames required for training RL policies,
more » ... hich is infeasible. Instead, we explore a self-supervised approach for training our exploration policy by introducing a notion of semantic curiosity. Our semantic curiosity policy is based on a simple observation -- the detection outputs should be consistent. Therefore, our semantic curiosity rewards trajectories with inconsistent labeling behavior and encourages the exploration policy to explore such areas. The exploration policy trained via semantic curiosity generalizes to novel scenes and helps train an object detector that outperforms baselines trained with other possible alternatives such as random exploration, prediction-error curiosity, and coverage-maximizing exploration.
arXiv:2006.09367v1 fatcat:gw7hu3rje5cjppw7tfkhyxj3lq

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection [article]

Xiaolong Wang, Abhinav Shrivastava, Abhinav Gupta
2017 arXiv   pre-print
How do we learn an object detector that is invariant to occlusions and deformations? Our current solution is to use a data-driven strategy -- collect large-scale datasets which have object instances under different conditions. The hope is that the final classifier can use these examples to learn invariances. But is it really possible to see all the occlusions in a dataset? We argue that like categories, occlusions and object deformations also follow a long-tail. Some occlusions and deformations
more » ... are so rare that they hardly happen; yet we want to learn a model invariant to such occurrences. In this paper, we propose an alternative solution. We propose to learn an adversarial network that generates examples with occlusions and deformations. The goal of the adversary is to generate examples that are difficult for the object detector to classify. In our framework both the original detector and adversary are learned in a joint manner. Our experimental results indicate a 2.3% mAP boost on VOC07 and a 2.6% mAP boost on VOC2012 object detection challenge compared to the Fast-RCNN pipeline. We also release the code for this paper.
arXiv:1704.03414v1 fatcat:loi7jgli65fwpkrwqq4pshk6cu

Hierarchical Neural Dynamic Policies [article]

Shikhar Bahl, Abhinav Gupta, Deepak Pathak
2021 arXiv   pre-print
We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input. The family of nonlinear dynamical system-based methods have successfully demonstrated dynamic robot behaviors but have difficulty in generalizing to unseen configurations as well as learning from image inputs. Recent works approach this issue by using deep network policies and reparameterize actions to embed the structure of dynamical systems but
more » ... still struggle in domains with diverse configurations of image goals, and hence, find it difficult to generalize. In this paper, we address this dichotomy by leveraging embedding the structure of dynamical systems in a hierarchical deep policy learning framework, called Hierarchical Neural Dynamical Policies (H-NDPs). Instead of fitting deep dynamical systems to diverse data directly, H-NDPs form a curriculum by learning local dynamical system-based policies on small regions in state-space and then distill them into a global dynamical system-based policy that operates only from high-dimensional images. H-NDPs additionally provide smooth trajectories, a strong safety benefit in the real world. We perform extensive experiments on dynamic tasks both in the real world (digit writing, scooping, and pouring) and simulation (catching, throwing, picking). We show that H-NDPs are easily integrated with both imitation as well as reinforcement learning setups and achieve state-of-the-art results. Video results are at https://shikharbahl.github.io/hierarchical-ndps/
arXiv:2107.05627v1 fatcat:4246ptcm5nf73glygr355w2gcu

The Functional Correspondence Problem [article]

Zihang Lai, Senthil Purushwalkam, Abhinav Gupta
2021 arXiv   pre-print
Acknowledgement: This research is supported by grants from ONR MURI, the ONR Young Investigator Award to Abhinav Gupta and the DAPRA MCS award.  ... 
arXiv:2109.01097v1 fatcat:z2h5tekeazcgphteurjp77v7sm

Actions Transformations [article]

Xiaolong Wang, Ali Farhadi, Abhinav Gupta
2016 arXiv   pre-print
What defines an action like "kicking ball"? We argue that the true meaning of an action lies in the change or transformation an action brings to the environment. In this paper, we propose a novel representation for actions by modeling an action as a transformation which changes the state of the environment before the action happens (precondition) to the state after the action (effect). Motivated by recent advancements of video representation using deep learning, we design a Siamese network
more » ... models the action as a transformation on a high-level feature space. We show that our model gives improvements on standard action recognition datasets including UCF101 and HMDB51. More importantly, our approach is able to generalize beyond learned action categories and shows significant performance improvement on cross-category generalization on our new ACT dataset.
arXiv:1512.00795v2 fatcat:rwvgtw6ut5g5bes7fz4p5vinnu

Contextual Priming and Feedback for Faster R-CNN [chapter]

Abhinav Shrivastava, Abhinav Gupta
2016 Lecture Notes in Computer Science  
The field of object detection has seen dramatic performance improvements in the last few years. Most of these gains are attributed to bottom-up, feedforward ConvNet frameworks. However, in case of humans, top-down information, context and feedback play an important role in doing object detection. This paper investigates how we can incorporate top-down information and feedback in the state-of-the-art Faster R-CNN framework. Specifically, we propose to: (a) augment Faster R-CNN with a semantic
more » ... mentation network; (b) use segmentation for top-down contextual priming; (c) use segmentation to provide top-down iterative feedback using two stage training. Our results indicate that all three contributions improve the performance on object detection, semantic segmentation and region proposal generation.
doi:10.1007/978-3-319-46448-0_20 fatcat:jlb55gw3jjbqnfgegg5vzrqswm

Learning to Explore using Active Neural SLAM [article]

Devendra Singh Chaplot, Dhiraj Gandhi, Saurabh Gupta, Abhinav Gupta, Ruslan Salakhutdinov
2020 arXiv   pre-print
This work presents a modular and hierarchical approach to learn policies for exploring 3D environments, called 'Active Neural SLAM'. Our approach leverages the strengths of both classical and learning-based methods, by using analytical path planners with learned SLAM module, and global and local policies. The use of learning provides flexibility with respect to input modalities (in the SLAM module), leverages structural regularities of the world (in global policies), and provides robustness to
more » ... rrors in state estimation (in local policies). Such use of learning within each module retains its benefits, while at the same time, hierarchical decomposition and modular training allow us to sidestep the high sample complexities associated with training end-to-end policies. Our experiments in visually and physically realistic simulated 3D environments demonstrate the effectiveness of our approach over past learning and geometry-based approaches. The proposed model can also be easily transferred to the PointGoal task and was the winning entry of the CVPR 2019 Habitat PointGoal Navigation Challenge.
arXiv:2004.05155v1 fatcat:6t7hhvlocfa4pel6y2v46tmusm

Watermarking of MPEG-4 Videos [chapter]

Abhinav Gupta, Phalguni Gupta
2004 Lecture Notes in Computer Science  
A MPEG-4 compressed domain video watermarking method is proposed and its performance is studied at video bit rates ranging from 64 Kb/s to 900 Kb/s. The watermark is inserted by modifying Discrete Cosine Transformation (DCT) coefficients. The strength of watermark is changed according to local frame characteristics to reduce impact on visual quality. The algorithm's performance is also studied for watermarking bits in a frame ranging from 1 Kb/frame to 3 Kb/frame. The watermark is attack-free
more » ... ainst attacks like scaling, rotation and cropping even if blind-techniques are used.
doi:10.1007/978-3-540-25948-0_101 fatcat:poabbnqjtbcbvf6aucsq5bxqwq

Interpretable Intuitive Physics Model [article]

Tian Ye, Xiaolong Wang, James Davidson, Abhinav Gupta
2018 arXiv   pre-print
Humans have a remarkable ability to use physical commonsense and predict the effect of collisions. But do they understand the underlying factors? Can they predict if the underlying factors have changed? Interestingly, in most cases humans can predict the effects of similar collisions with different conditions such as changes in mass, friction, etc. It is postulated this is primarily because we learn to model physics with meaningful latent variables. This does not imply we can estimate the
more » ... e values of these meaningful variables (estimate exact values of mass or friction). Inspired by this observation, we propose an interpretable intuitive physics model where specific dimensions in the bottleneck layers correspond to different physical properties. In order to demonstrate that our system models these underlying physical properties, we train our model on collisions of different shapes (cube, cone, cylinder, spheres etc.) and test on collisions of unseen combinations of shapes. Furthermore, we demonstrate our model generalizes well even when similar scenes are simulated with different underlying properties.
arXiv:1808.10002v1 fatcat:lb435reiuna53frbr4plr3hwtq
« Previous Showing results 1 — 15 out of 1,095 results