Filters








41 Hits in 1.3 sec

Semantic Image Based Geolocation Given a Map [article]

Arsalan Mousavian, Jana Kosecka
2016 arXiv   pre-print
Mousavian and Jana Košecká are with the Computer Science Department, Volgenau School of Engineering at George Mason University, Fairfax, VA 22030, USA. amousavi@gmu.edu, kosecka@gmu.edu.  ...  The obtained 3D model along with the building Arsalan identity obtained from appearance based location recognition is then used to estimate the pose likelihood of the image given the map of the area  ... 
arXiv:1609.00278v1 fatcat:ermj7t5lorhulccpachqszbnba

Deep Convolutional Features for Image Based Retrieval and Scene Categorization [article]

Arsalan Mousavian, Jana Kosecka
2015 arXiv   pre-print
Several recent approaches showed how the representations learned by Convolutional Neural Networks can be repurposed for novel tasks. Most commonly it has been shown that the activation features of the last fully connected layers (fc7 or fc6) of the network, followed by a linear classifier outperform the state-of-the-art on several recognition challenge datasets. Instead of recognition, this paper focuses on the image retrieval problem and proposes a examines alternative pooling strategies
more » ... d for CNN features. The presented scheme uses the features maps from an earlier layer 5 of the CNN architecture, which has been shown to preserve coarse spatial information and is semantically meaningful. We examine several pooling strategies and demonstrate superior performance on the image retrieval task (INRIA Holidays) at the fraction of the computational cost, while using a relatively small memory requirements. In addition to retrieval, we see similar efficiency gains on the SUN397 scene categorization dataset, demonstrating wide applicability of this simple strategy. We also introduce and evaluate a novel GeoPlaces5K dataset from different geographical locations in the world for image retrieval that stresses more dramatic changes in appearance and viewpoint.
arXiv:1509.06033v1 fatcat:q6jj7jwbz5fotercxumrk53tsy

ACRONYM: A Large-Scale Grasp Dataset Based on Simulation [article]

Clemens Eppner, Arsalan Mousavian, Dieter Fox
2020 arXiv   pre-print
the set of ground truth grasps G * for the object,ĝ is the predicted grasp given the object point cloud and the sampled latent, and d(·, ·) is the distance function between the grasps that was used in Mousavian  ... 
arXiv:2011.09584v1 fatcat:ifkmxwosbfgihh5vutcdylgvei

Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes [article]

Martin Sundermeyer, Arsalan Mousavian, Rudolph Triebel, Dieter Fox
2021 arXiv   pre-print
Grasping unseen objects in unconstrained, cluttered environments is an essential skill for autonomous robotic manipulation. Despite recent progress in full 6-DoF grasp learning, existing approaches often consist of complex sequential pipelines that possess several potential failure points and run-times unsuitable for closed-loop grasping. Therefore, we propose an end-to-end network that efficiently generates a distribution of 6-DoF parallel-jaw grasps directly from a depth recording of a scene.
more » ... Our novel grasp representation treats 3D points of the recorded point cloud as potential grasp contacts. By rooting the full 6-DoF grasp pose and width in the observed point cloud, we can reduce the dimensionality of our grasp representation to 4-DoF which greatly facilitates the learning process. Our class-agnostic approach is trained on 17 million simulated grasps and generalizes well to real world sensor data. In a robotic grasping study of unseen objects in structured clutter we achieve over 90% success rate, cutting the failure rate in half compared to a recent state-of-the-art method.
arXiv:2103.14127v1 fatcat:ofjfk54pove3bglsajuan2v24a

Multiview RGB-D Dataset for Object Instance Detection [article]

Georgios Georgakis, Md Alimoor Reza, Arsalan Mousavian, Phi-Hung Le, Jana Kosecka
2016 arXiv   pre-print
This paper presents a new multi-view RGB-D dataset of nine kitchen scenes, each containing several objects in realistic cluttered environments including a subset of objects from the BigBird dataset. The viewpoints of the scenes are densely sampled and objects in the scenes are annotated with bounding boxes and in the 3D point cloud. Also, an approach for detection and recognition is presented, which is comprised of two parts: i) a new multi-view 3D proposal generation method and ii) the
more » ... ent of several recognition baselines using AlexNet to score our proposals, which is trained either on crops of the dataset or on synthetically composited training images. Finally, we compare the performance of the object proposals and a detection baseline to the Washington RGB-D Scenes (WRGB-D) dataset and demonstrate that our Kitchen scenes dataset is more challenging for object detection and recognition. The dataset is available at: http://cs.gmu.edu/~robot/gmu-kitchens.html.
arXiv:1609.07826v1 fatcat:hcn6tpj5xvgdbabor5qwqh6som

RICE: Refining Instance Masks in Cluttered Environments with Graph Neural Networks [article]

Christopher Xie, Arsalan Mousavian, Yu Xiang, Dieter Fox
2021 arXiv   pre-print
Segmenting unseen object instances in cluttered environments is an important capability that robots need when functioning in unstructured environments. While previous methods have exhibited promising results, they still tend to provide incorrect results in highly cluttered scenes. We postulate that a network architecture that encodes relations between objects at a high-level can be beneficial. Thus, in this work, we propose a novel framework that refines the output of such methods by utilizing
more » ... graph-based representation of instance masks. We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the perturbed segmentations. Our proposed method is orthogonal to previous works and achieves state-of-the-art performance when combined with them. We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes. Code, models, and video can be found at https://github.com/chrisdxie/rice .
arXiv:2106.15711v1 fatcat:5ty3udy3gbcadclxdjbgx5qvfe

Object Rearrangement Using Learned Implicit Collision Functions [article]

Michael Danielczuk, Arsalan Mousavian, Clemens Eppner, Dieter Fox
2021 arXiv   pre-print
Robotic object rearrangement combines the skills of picking and placing objects. When object models are unavailable, typical collision-checking models may be unable to predict collisions in partial point clouds with occlusions, making generation of collision-free grasping or placement trajectories challenging. We propose a learned collision model that accepts scene and query object point clouds and predicts collisions for 6DOF object poses within the scene. We train the model on a synthetic set
more » ... of 1 million scene/object point cloud pairs and 2 billion collision queries. We leverage the learned collision model as part of a model predictive path integral (MPPI) policy in a tabletop rearrangement task and show that the policy can plan collision-free grasps and placements for objects unseen in training in both simulated and physical cluttered scenes with a Franka Panda robot. The learned model outperforms both traditional pipelines and learned ablations by 9.8% in accuracy on a dataset of simulated collision queries and is 75x faster than the best-performing baseline. Videos and supplementary material are available at https://research.nvidia.com/publication/2021-03_Object-Rearrangement-Using.
arXiv:2011.10726v2 fatcat:ctrdpy66kfblfketagfrcnjviy

Unseen Object Instance Segmentation for Robotic Environments [article]

Christopher Xie, Yu Xiang, Arsalan Mousavian, Dieter Fox
2021 arXiv   pre-print
Mousavian are with NVIDIA, 2788 San Tomas Expressway Santa Clara, CA 95051 USA (email: yux@nvidia.com; amousa-vian@nvidia.com) D.  ... 
arXiv:2007.08073v2 fatcat:urn5ojcgw5gfxjeuuoqwcq3ezm

Interpreting and Predicting Tactile Signals for the SynTouch BioTac [article]

Yashraj S. Narang and Balakumar Sundaralingam and Karl Van Wyk and Arsalan Mousavian and Dieter Fox
2021 arXiv   pre-print
In the human hand, high-density contact information provided by afferent neurons is essential for many human grasping and manipulation capabilities. In contrast, robotic tactile sensors, including the state-of-the-art SynTouch BioTac, are typically used to provide low-density contact information, such as contact location, center of pressure, and net force. Although useful, these data do not convey or leverage the rich information content that some tactile sensors naturally measure. This
more » ... extends robotic tactile sensing beyond reduced-order models through 1) the automated creation of a precise experimental tactile dataset for the BioTac over a diverse range of physical interactions, 2) a 3D finite element (FE) model of the BioTac, which complements the experimental dataset with high-density, distributed contact data, 3) neural-network-based mappings from raw BioTac signals to not only low-dimensional experimental data, but also high-density FE deformation fields, and 4) mappings from the FE deformation fields to the raw signals themselves. The high-density data streams can provide a far greater quantity of interpretable information for grasping and manipulation algorithms than previously accessible.
arXiv:2101.05452v1 fatcat:p4lzo2kpybcwnhedua5kqybvuq

Interpreting and Predicting Tactile Signals via a Physics-Based and Data-Driven Framework [article]

Yashraj S. Narang, Karl Van Wyk, Arsalan Mousavian, Dieter Fox
2020 arXiv   pre-print
High-density afferents in the human hand have long been regarded as essential for human grasping and manipulation abilities. In contrast, robotic tactile sensors are typically used to provide low-density contact data, such as center-of-pressure and resultant force. Although useful, this data does not exploit the rich information content that some tactile sensors (e.g., the SynTouch BioTac) naturally provide. This research extends robotic tactile sensing beyond reduced-order models through 1)
more » ... automated creation of a precise tactile dataset for the BioTac over diverse physical interactions, 2) a 3D finite element (FE) model of the BioTac, which complements the experimental dataset with high-resolution, distributed contact data, and 3) neural-network-based mappings from raw BioTac signals to low-dimensional experimental data, and more importantly, high-density FE deformation fields. These data streams can provide a far greater quantity of interpretable information for grasping and manipulation algorithms than previously accessible.
arXiv:2006.03777v1 fatcat:2isf44sgffdz7aj664ry3gvn3i

NeRP: Neural Rearrangement Planning for Unknown Objects [article]

Ahmed H. Qureshi, Arsalan Mousavian, Chris Paxton, Michael C. Yip, Dieter Fox
2021 arXiv   pre-print
Robots will be expected to manipulate a wide variety of objects in complex and arbitrary ways as they become more widely used in human environments. As such, the rearrangement of objects has been noted to be an important benchmark for AI capabilities in recent years. We propose NeRP (Neural Rearrangement Planning), a deep learning based approach for multi-step neural object rearrangement planning which works with never-before-seen objects, that is trained on simulation data, and generalizes to
more » ... he real world. We compare NeRP to several naive and model-based baselines, demonstrating that our approach is measurably better and can efficiently arrange unseen objects in fewer steps and with less planning time. Finally, we demonstrate it on several challenging rearrangement problems in the real world.
arXiv:2106.01352v2 fatcat:j4q7mrzlhrgnxngzx5wodnpjie

Synthesizing Training Data for Object Detection in Indoor Scenes [article]

Georgios Georgakis, Arsalan Mousavian, Alexander C. Berg, Jana Kosecka
2017 arXiv   pre-print
Detection of objects in cluttered indoor environments is one of the key enabling functionalities for service robots. The best performing object detection approaches in computer vision exploit deep Convolutional Neural Networks (CNN) to simultaneously detect and categorize the objects of interest in cluttered scenes. Training of such models typically requires large amounts of annotated training data which is time consuming and costly to obtain. In this work we explore the ability of using
more » ... ically generated composite images for training state-of-the-art object detectors, especially for object instance detection. We superimpose 2D images of textured object models into images of real environments at variety of locations and scales. Our experiments evaluate different superimposition strategies ranging from purely image-based blending all the way to depth and semantics informed positioning of the object models into real scenes. We demonstrate the effectiveness of these object detector training strategies on two publicly available datasets, the GMU-Kitchens and the Washington RGB-D Scenes v2. As one observation, augmenting some hand-labeled training data with synthetic examples carefully composed onto scenes yields object detectors with comparable performance to using much more hand-labeled data. Broadly, this work charts new opportunities for training detectors for new objects by exploiting existing object model repositories in either a purely automatic fashion or with only a very small number of human-annotated examples.
arXiv:1702.07836v2 fatcat:fmr3opu7avfafh3x5pdzfkvwwm

RGB-D Local Implicit Function for Depth Completion of Transparent Objects [article]

Luyang Zhu, Arsalan Mousavian, Yu Xiang, Hammad Mazhar, Jozef van Eenbergen, Shoubhik Debnath, Dieter Fox
2021 arXiv   pre-print
Majority of the perception methods in robotics require depth information provided by RGB-D cameras. However, standard 3D sensors fail to capture depth of transparent objects due to refraction and absorption of light. In this paper, we introduce a new approach for depth completion of transparent objects from a single RGB-D image. Key to our approach is a local implicit neural representation built on ray-voxel pairs that allows our method to generalize to unseen objects and achieve fast inference
more » ... speed. Based on this representation, we present a novel framework that can complete missing depth given noisy RGB-D input. We further improve the depth estimation iteratively using a self-correcting refinement model. To train the whole pipeline, we build a large scale synthetic dataset with transparent objects. Experiments demonstrate that our method performs significantly better than the current state-of-the-art methods on both synthetic and real world data. In addition, our approach improves the inference speed by a factor of 20 compared to the previous best method, ClearGrasp. Code and dataset will be released at https://research.nvidia.com/publication/2021-03_RGB-D-Local-Implicit.
arXiv:2104.00622v1 fatcat:rcjpmc5s45fylpylkhsbbtxfqe

Synthesizing Training Data for Object Detection in Indoor Scenes

Georgios Georgakis, Arsalan Mousavian, Alexander Berg, Jana Kosecka
2017 Robotics: Science and Systems XIII  
Detection of objects in cluttered indoor environments is one of the key enabling functionalities for service robots. The best performing object detection approaches in computer vision exploit deep Convolutional Neural Networks (CNN) to simultaneously detect and categorize the objects of interest in cluttered scenes. Training of such models typically requires large amounts of annotated training data which is time consuming and costly to obtain. In this work we explore the ability of using
more » ... ically generated composite images for training state-ofthe-art object detectors, especially for object instance detection. We superimpose 2D images of textured object models into images of real environments at variety of locations and scales. Our experiments evaluate different superimposition strategies ranging from purely image-based blending all the way to depth and semantics informed positioning of the object models into real scenes. We demonstrate the effectiveness of these object detector training strategies on two publicly available datasets, the GMU-Kitchens [5] and the Washington RGB-D Scenes v2 [11] . As one observation, augmenting some hand-labeled training data with synthetic examples carefully composed onto scenes yields object detectors with comparable performance to using much more hand-labeled data. Broadly, this work charts new opportunities for training detectors for new objects by exploiting existing object model repositories in either a purely automatic fashion or with only a very small number of human-annotated examples.
doi:10.15607/rss.2017.xiii.043 dblp:conf/rss/GeorgakisMBK17 fatcat:hb42i6ej5veubdl3n3fkcwpv2i

3D Bounding Box Estimation Using Deep Learning and Geometry [article]

Arsalan Mousavian, Dragomir Anguelov, John Flynn, Jana Kosecka
2017 arXiv   pre-print
We present a method for 3D object detection and pose estimation from a single image. In contrast to current techniques that only regress the 3D orientation of an object, our method first regresses relatively stable 3D object properties using a deep convolutional neural network and then combines these estimates with geometric constraints provided by a 2D object bounding box to produce a complete 3D bounding box. The first network output estimates the 3D object orientation using a novel hybrid
more » ... crete-continuous loss, which significantly outperforms the L2 loss. The second output regresses the 3D object dimensions, which have relatively little variance compared to alternatives and can often be predicted for many object types. These estimates, combined with the geometric constraints on translation imposed by the 2D bounding box, enable us to recover a stable and accurate 3D object pose. We evaluate our method on the challenging KITTI object detection benchmark both on the official metric of 3D orientation estimation and also on the accuracy of the obtained 3D bounding boxes. Although conceptually simple, our method outperforms more complex and computationally expensive approaches that leverage semantic segmentation, instance level segmentation and flat ground priors and sub-category detection. Our discrete-continuous loss also produces state of the art results for 3D viewpoint estimation on the Pascal 3D+ dataset.
arXiv:1612.00496v2 fatcat:uhlm3n6xxfbk5pzen3msvmmzm4
« Previous Showing results 1 — 15 out of 41 results