9,040 Hits in 4.8 sec

Towards Visual Affordance Learning: A Benchmark for Affordance Segmentation and Recognition [article]

Zeyad Osama Khalifa, Syed Afaq Ali Shah
2022 arXiv   pre-print
To the best of our knowledge, this is the first ever and the largest multi-view RGBD visual affordance learning dataset. We benchmark the proposed dataset for affordance recognition and segmentation.  ...  , for affordance recognition, detection and segmentation.  ...  Dataset Construction and Annotation We present a large scale multi-view visual affordance learning dataset for affordance segmentation and recognition.  ... 
arXiv:2203.14092v1 fatcat:jhpsjl6z4nd4nk43m5ivonuqwy

Are standard Object Segmentation models sufficient for Learning Affordance Segmentation? [article]

Hugo Caselles-Dupré, Michael Garcia-Ortiz, David Filliat
2021 arXiv   pre-print
We conclude that the problem of supervised affordance segmentation is included in the problem of object segmentation and argue that better benchmarks for affordance learning should include action capacities  ...  Proposed benchmarks and state-of-the-art prediction models for supervised affordance segmentation are usually modifications of popular object segmentation models such as Mask R-CNN.  ...  Object segmentation and affordance segmentation are obviously related, being two visual segmentation tasks.  ... 
arXiv:2107.02095v1 fatcat:zwl2mkvslne4dpihpqheseczgy

A Deep Learning Approach to Object Affordance Segmentation

Spyridon Thermos, Petros Daras, Gerasimos Potamianos
2020 ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Learning to understand and infer object functionalities is an important step towards robust visual intelligence.  ...  In this paper, we propose a novel approach that exploits the spatio-temporal nature of human-object interaction for affordance segmentation.  ...  Further, we set λ1 = 0.2 and λ2 = 0.8 for the first 150 epochs, as action recognition is a critical step towards affordance segmentation and should converge faster that the total loss.  ... 
doi:10.1109/icassp40776.2020.9054167 dblp:conf/icassp/ThermosDP20 fatcat:vz3h4jzwj5a7daufhvik2elxjy

Navigation-Oriented Scene Understanding for Robotic Autonomy: Learning to Segment Driveability in Egocentric Images [article]

Galadrielle Humblot-Renaux, Letizia Marchegiani, Thomas B. Moeslund, Rikke Gade
2021 arXiv   pre-print
In a zero-shot cross-dataset generalization experiment, we show that our affordance learning scheme can be applied across a diverse mix of datasets and improves driveability estimation in unseen environments  ...  However, such a representation is not directly interpretable for decision-making and constrains robot operation to a specific domain.  ...  In contrast, we approach visual affordance learning as a fully supervised image segmentation problem, leveraging the abundance of large-scale scene understanding datasets.  ... 
arXiv:2109.07245v1 fatcat:2nyh4uopgjfrnmeutyfzysmd54

Can machines learn to see without visual databases? [article]

Alessandro Betti, Marco Gori, Stefano Melacci, Marcello Pelillo, Fabio Roli
2021 arXiv   pre-print
This paper sustains the position that the time has come for thinking of learning machines that conquer visual skills in a truly human-like context, where a few human-like object supervisions are given  ...  This might open the doors to a truly orthogonal competitive track concerning deep learning technologies for vision which does not rely on the accumulation of huge visual databases.  ...  Acknowledgments and Disclosure of Funding This work was partly supported by the PRIN 2017 project RexLearn, funded by the Italian Ministry of Education, University and Research (grant no. 2017TWNMH2).  ... 
arXiv:2110.05973v2 fatcat:tnn23cc4d5dt3mjj2mbir3xkbu

Deep Learning for Assistive Computer Vision [chapter]

Marco Leo, Antonino Furnari, Gerard G. Medioni, Mohan Trivedi, Giovanni M. Farinella
2019 Lecture Notes in Computer Science  
The paper is concluded with a discussion and insights for future directions.  ...  achieved in five main areas, namely, object classification and localization, scene understanding, human pose estimation and tracking, action/event recognition and anticipation.  ...  For this reason, deep learning is currently considered the primary candidate for any visual recognition task [38] and it is being extended to visual reasoning [17] .  ... 
doi:10.1007/978-3-030-11024-6_1 fatcat:ehmobmgjcba2tkopvfdfxhyb4i

A self-organizing neural network architecture for learning human-object interactions [article]

Luiza Mici, German I. Parisi, Stefan Wermter
2018 arXiv   pre-print
The visual recognition of transitive actions comprising human-object interactions is a key component for artificial systems operating in natural environments.  ...  Our model consists of a hierarchy of Grow-When-Required (GWR) networks that learn prototypical representations of body motion patterns and objects, accounting for the development of action-object mappings  ...  Acknowledgments The authors gratefully acknowledge partial support by the EU-and City of Hamburg-funded program Pro-Exzellenzia 4.0, the German Research Foundation DFG under project CML (TRR 169), and  ... 
arXiv:1710.01916v2 fatcat:eu7c7wn3anfx5hjzabufbrrdvq

3D Compositional Zero-shot Learning with DeCompositional Consensus [article]

Muhammad Ferjad Naeem, Evin Pınar Örnek, Yongqin Xian, Luc Van Gool, Federico Tombari
2022 arXiv   pre-print
Towards this, we present 3D Compositional Zero-shot Learning as a problem of part generalization from seen to unseen object classes for semantic segmentation.  ...  We provide a structured study through benchmarking the task with the proposed Compositional-PartNet dataset.  ...  Towards this, one line of research aims to learn a transformation between objects and states [34, 37, 28] .  ... 
arXiv:2111.14673v2 fatcat:67xaei74abbphmuq35svzde4zi

HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving [article]

Sirui Xie, Xiaojian Ma, Peiyu Yu, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu
2021 arXiv   pre-print
This benchmark is centered around a novel task domain, HALMA, for visual concept development and rapid problem-solving.  ...  Uniquely, HALMA has a minimum yet complete concept space, upon which we introduce a novel paradigm to rigorously diagnose and dissect learning agents' capability in understanding and generalizing complex  ...  ACKNOWLEDGMENTS The authors thank Chi Zhang and Baoxiong Jia of UCLA Computer Science Department for useful discussions.  ... 
arXiv:2102.11344v1 fatcat:a62vpmwkurh4jomeonnrywvtfa

Learning Local Shape Descriptors from Part Correspondences With Multi-view Convolutional Networks [article]

Haibin Huang, Evangelos Kalogerakis, Siddhartha Chaudhuri, Duygu Ceylan, Vladimir G. Kim, Ersin Yumer
2017 arXiv   pre-print
We present a new local descriptor for 3D shapes, directly applicable to a wide range of shape analysis problems such as point correspondences, semantic segmentation, affordance prediction, and shape-to-scan  ...  We demonstrate through several experiments that our learned local descriptors are more discriminative compared to state of the art alternatives, and are effective in a variety of shape analysis applications  ...  ACKNOWLEDGEMENTS Kalogerakis acknowledges support from NSF (CHS-1422441, CHS-1617333), NVidia and Adobe. Chaudhuri acknowledges support from Adobe and Qualcomm.  ... 
arXiv:1706.04496v2 fatcat:26jeic2jb5gx7l4aonrdqak7di

Compositional Learning for Human Object Interaction [chapter]

Keizo Kato, Yin Li, Abhinav Gupta
2018 Lecture Notes in Computer Science  
We also provide benchmarks on several dataset for zero-shot learning including both image and video.  ...  To deal with this problem, In this paper, we propose a novel method using external knowledge graph and graph convolutional networks which learns how to compose classifiers for verbnoun pairs.  ...  The authors would like to thank Xiaolong Wang and Gunnar Sigurdsson for many helpful discussions.  ... 
doi:10.1007/978-3-030-01264-9_15 fatcat:vq7fuztagnd5ff46an4nd5lolu

Learning Depth-Aware Deep Representations for Robotic Perception

Lorenzo Porzi, Samuel Rota Bulo, Adrian Penate-Sanchez, Elisa Ricci, Francesc Moreno-Noguer
2017 IEEE Robotics and Automation Letters  
Exploiting RGB-D data by means of Convolutional Neural Networks (CNNs) is at the core of a number of robotics applications, including object detection, scene semantic segmentation and grasping.  ...  We demonstrate the benefits of DaConv on a variety of robotics oriented tasks, involving affordance detection, object coordinate regression and contour detection in RGB-D images.  ...  In other words, each affordance constitutes a class in the segmentation problem.  ... 
doi:10.1109/lra.2016.2637444 dblp:journals/ral/PorziBSRM17 fatcat:h2m2dok56jdxlmxcs4ja25mkvm

Object Affordance Driven Inverse Reinforcement Learning Through Conceptual Abstraction and Advice

Rupam Bhattacharyya, Shyamanta M. Hazarika
2018 Paladyn: Journal of Behavioral Robotics  
Within human Intent Recognition (IR), a popular approach to learning from demonstration is Inverse Reinforcement Learning (IRL).  ...  Object affordances have been used for IR. Existing literature on recognizing intents through object affordances fall short of utilizing its true potential.  ...  Acknowledgement: The authors would also like to thank Zubin Bhuyan, University of Massachusetts Lowell, for the discussion regarding IRL.  ... 
doi:10.1515/pjbr-2018-0021 fatcat:m6a2fm5ja5elnp25prgy6mnjp4

Deep Learning for Computer Vision (Dagstuhl Seminar 17391)

Daniel Cremers, Laura Leal-Taixé, René Vidal, Marc Herbstritt
2018 Dagstuhl Reports  
The field of computer vision engages in the goal to enable and enhance a machine's ability to infer knowledge and information from spatial and visual input data.  ...  Despite its high dimensional and complex input space, research in the field of computer vision was and still is one of the main driving forces for new development in machine and deep learning, and vice  ...  Towards this goal, we create a novel real-scene indoor benchmark composed of 4D light-field images obtained from a plenoptic camera and ground truth depth obtained from a registered RGB-D sensor.  ... 
doi:10.4230/dagrep.7.9.109 dblp:journals/dagstuhl-reports/CremersLV17 fatcat:qjskvbzrvvhmvhrwmuntihctby

Affordance Learning for End-to-End Visuomotor Robot Control [article]

Aleksi Hämäläinen, Karol Arndt, Ali Ghadirzadeh, Ville Kyrki
2019 arXiv   pre-print
We also introduce a method for affordance dataset generation, which is easily generalizable to new tasks, objects and environments, and requires no manual pixel labeling.  ...  The data is exchanged between parts of the system as low-dimensional latent representations of affordances and trajectories.  ...  In particular, we first evaluated if affordances can be represented in a latent space using the UMD dataset [27] , a standard benchmark for the affordance detection task.  ... 
arXiv:1903.04053v1 fatcat:ju3xkwfz4jdbtoytw2adjwyifm
« Previous Showing results 1 — 15 out of 9,040 results