Filters








2,002 Hits in 7.3 sec

Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation [article]

Yu Xiang, Christopher Xie, Arsalan Mousavian, Dieter Fox
2021 arXiv   pre-print
In this work, we propose a new method for unseen object instance segmentation by learning RGB-D feature embeddings from synthetic data.  ...  Our method demonstrates that non-photorealistic synthetic RGB and depth images can be used to learn feature embeddings that transfer well to real-world images for unseen object instance segmentation.  ...  Learning from Synthetic Data We employ deep neural networks for unseen object instance segmentation.  ... 
arXiv:2007.15157v3 fatcat:oswra62xdngfzcckhdmqzlzrp4

Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation [article]

Yu Xiang, Christopher Xie, Arsalan Mousavian, Dieter Fox
2020
In this work, we propose a new method for unseen object instance segmentation by learning RGB-D feature embeddings from synthetic data.  ...  Our method demonstrates that non-photorealistic synthetic RGB and depth images can be used to learn feature embeddings that transfer well to real-world images for unseen object instance segmentation.  ...  Learning from Synthetic Data We employ deep neural networks for unseen object instance segmentation.  ... 
doi:10.48550/arxiv.2007.15157 fatcat:l2vc7733arcx3nkxfp6gjfbxga

Unseen Object Instance Segmentation with Fully Test-time RGB-D Embeddings Adaptation [article]

Lu Zhang, Siqi Zhang, Xu Yang, Zhiyong Liu
2022 arXiv   pre-print
Segmenting unseen objects is a crucial ability for the robot since it may encounter new environments during the operation.  ...  Recently, a popular solution is leveraging RGB-D features of large-scale synthetic data and directly applying the model to unseen real-world scenarios.  ...  By re-calibrating the channel of RGB and depth feature maps, better-fused RGB-D embeddings for unseen object instance segmentation are obtained.  ... 
arXiv:2204.09847v1 fatcat:wsskwn6ypfaczd4beoamni3sjq

Semantic Segmentation from Limited Training Data [article]

A. Milan, T. Pham, K. Vijay, D. Morrison, A.W. Tow, L. Liu, J. Erskine, R. Grinover, A. Gurman, T. Hunn, N. Kelly-Boxall, D. Lee (+13 others)
2017 arXiv   pre-print
Next to small objects with shiny and transparent surfaces, the biggest challenge of the 2017 competition was the introduction of unseen categories.  ...  One is a deep metric learning approach that works in three separate steps: semantic-agnostic boundary detection, patch classification and pixel-wise voting.  ...  Our first approach is inspired by the recent work on deep metric learning [18] , [1] , [19] , [20] which shows that the feature embedding learned on seen categories generalises well to unseen object  ... 
arXiv:1709.07665v1 fatcat:syfujvprnvbjrhob5tvy2syrpq

Semantic Segmentation from Limited Training Data

A. Milan, T. Pham, K. Vijay, D. Morrison, A.W. Tow, L. Liu, J. Erskine, R. Grinover, A. Gurman, T. Hunn, N. Kelly-Boxall, D. Lee (+13 others)
2018 2018 IEEE International Conference on Robotics and Automation (ICRA)  
Our first approach is inspired by the recent work on deep metric learning [18] , [1] , [19] , [20] which shows that the feature embedding learned on seen categories generalises well to unseen object  ...  RBO [8] , the winning team of the first competition approached the object segmentation problem using one RGB-D sensor and without employing any deep learning techniques.  ... 
doi:10.1109/icra.2018.8461082 dblp:conf/icra/MilanPVMTLEGGHK18 fatcat:773po3dfmfek7lvulqacevinfy

Monocular Instance Motion Segmentation for Autonomous Driving: KITTI InstanceMotSeg Dataset and Multi-task Baseline [article]

Eslam Mohamed, Mahmoud Ewaisha, Mennatullah Siam, Hazem Rashed, Senthil Yogamani, Waleed Hamdy, Muhammad Helmi, Ahmad El-Sallab
2021 arXiv   pre-print
Moving object segmentation is a crucial task for autonomous vehicles as it can be used to segment objects in a class agnostic manner based on their motion cues.  ...  The model then learns separate prototype coefficients within the class agnostic and semantic heads providing two independent paths of object detection for redundant safety.  ...  ACKNOWLEDGEMENTS We would like to thank B Ravi Kiran (Navya), Letizia Mariotti and Lucie Yahiaoui for reviewing the paper and providing feedback.  ... 
arXiv:2008.07008v4 fatcat:a54do7k7rrdhdj75dm6wyom5qy

A 6DoF Pose Estimation Dataset and Network for Multiple Parametric Shapes in Stacked Scenarios

Xinyu Zhang, Weijie Lv, Long Zeng
2021 Machines  
The 6DoF (6D) pose estimation tasks are challenging, since some part objects from a known template may be unseen before.  ...  In particular, the test set is further divided into a TEST-L dataset for learning evaluation and a TEST-G dataset for generalization evaluation.  ...  Acknowledgments: We acknowledge the support provided for this study by Tsinghua University. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/machines9120321 fatcat:iaeiuudt6vfhjnbjcfxckptium

Robot Object Retrieval with Contextual Natural Language Queries [article]

Thao Nguyen, Nakul Gopalan, Roma Patel, Matt Corsaro, Ellie Pavlick, Stefanie Tellex
2020 arXiv   pre-print
The model takes in a language command containing a verb, for example "Hand me something to cut," and RGB images of candidate objects and selects the object that best satisfies the task specified by the  ...  and 53.0% on unseen object classes and unknown nouns.  ...  James Tompkin for advice on selecting the image dataset and encoder, and Eric Rosen for help with video editing.  ... 
arXiv:2006.13253v1 fatcat:44m3xnuwqfamvfxruo2fw7npzu

Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve [article]

Weicheng Kuo, Anelia Angelova, Tsung-Yi Lin, Angela Dai
2020 arXiv   pre-print
We construct a joint embedding space between the detected regions of an image corresponding to an object and 3D CAD models, enabling retrieval of CAD models for an input RGB image.  ...  We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimizes for the most similar CAD model and its pose.  ...  Acknowledgments We would like to thank Georgia Gkioxari for her advice on Mesh R-CNN and the support of the ZD.B (Zentrum Digitalisierung.Bayern) for Angela Dai.  ... 
arXiv:2007.13034v1 fatcat:yabpz6uoczdhtopk2fmmfkwqq4

Dynamic Detection and Recognition of Objects Based on Sequential RGB Images

Shuai Dong, Zhihua Yang, Wensheng Li, Kun Zou
2021 Future Internet  
MVRFFNet is a generalized zero-shot learning (GZSL) framework based on the Wasserstein generative adversarial network for 3D object recognition.  ...  Based on the LSM, LSMNet can adopt a pix2pix architecture to segment instances.  ...  There are two types of deep learning models for instance segmentation: two-stage models and one-stage models.  ... 
doi:10.3390/fi13070176 fatcat:wm6tolhqyfahjnp6quhubulu6y

Zero-Shot Temporal Action Detection via Vision-Language Prompting [article]

Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang
2022 arXiv   pre-print
Zero-shot TAD (ZS-TAD) resolves this obstacle by enabling a pre-trained model to recognize any unseen action classes.  ...  Existing temporal action detection (TAD) methods rely on large training data including segment-level annotations, limited to recognizing previously seen classes alone during inference.  ...  Visual embedding: To extract features from the video snippets we use a frozen pre-trained video encoder (e.g., a I3D [6] , CLIP [35] ) to extract RGB X r ∈ R d×T and optical flow features X o ∈ R d×T  ... 
arXiv:2207.08184v1 fatcat:5op7uk6gqzcvzilrcbyczqvqr4

Few-Shot Visual Grounding for Natural Human-Robot Interaction [article]

Giorgos Tziafas, Hamidreza Kasaei
2021 arXiv   pre-print
We evaluate the performance of the proposed model on real RGB-D data collected from public scene datasets.  ...  Towards addressing this point, we propose a software architecture that segments a target object from a crowded scene, indicated verbally by a human user.  ...  The image samples are collected from subsets of two RGB-D datasets, namely: RGBD-Scenes [26] , used most popularly for 3D vision learning and the Objects Cluttered Indoors Dataset (OCID) [1] , used for  ... 
arXiv:2103.09720v2 fatcat:ehzhwrnjojfk3p4uegjlrlwhcq

Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation

Dengsheng Chen, Jun Li, Zheng Wang, Kai Xu
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
To tackle intra-class shape variations, we learn canonical shape space (CASS), a unified representation for a large variety of instances of a certain object category.  ...  Since the 3D point cloud is generated in normalized pose (with actual size), the encoder of the VAE learns view-factorized RGBD embedding.  ...  Acknowledgement We thank the anonymous reviewers for the valuable suggestions. We are grateful to Chen Wang, one of the authors of DenseFusion, for the help and discussion.  ... 
doi:10.1109/cvpr42600.2020.01199 dblp:conf/cvpr/ChenLWX20 fatcat:xwmysf6sezelrl3fxxv5uzyaj4

Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation [article]

Dengsheng Chen and Jun Li and Zheng Wang and Kai Xu
2021 arXiv   pre-print
To tackle intra-class shape variations, we learn canonical shape space (CASS), a unified representation for a large variety of instances of a certain object category.  ...  Since the 3D point cloud is generated in normalized pose (with actual size), the encoder of the VAE learns view-factorized RGBD embedding.  ...  Acknowledgement We thank the anonymous reviewers for the valuable suggestions. We are grateful to Chen Wang, one of the authors of DenseFusion, for the help and discussion.  ... 
arXiv:2001.09322v3 fatcat:pyehmt5pu5fklgkcb4bgdcz4k4

CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild [article]

Yang You, Ruoxi Shi, Weiming Wang, Cewu Lu
2022 arXiv   pre-print
In this paper, we tackle the problem of category-level 9D pose estimation in the wild, given a single RGB-D frame.  ...  Besides, category-level pose estimation requires a method to be able to generalize to unseen objects at test time, which is also challenging.  ...  This work was also supported by the Shanghai AI development project (2020-RGZN-02006) and "cross research fund for translational medicine" of Shanghai Jiao Tong University (zh2018qnb17, zh2018qna37, YG2022ZD018  ... 
arXiv:2203.03089v2 fatcat:4q6xcvkqfjh27cwun53edcdggq
« Previous Showing results 1 — 15 out of 2,002 results