940 Hits in 6.7 sec

Geometry-Aware Recurrent Neural Networks for Active Visual Recognition [article]

Ricson Cheng, Ziyan Wang, Katerina Fragkiadaki
2018 arXiv   pre-print
We present recurrent geometry-aware neural networks that integrate visual information across multiple views of a scene into 3D latent feature tensors, while maintaining an one-to-one mapping between 3D  ...  physical locations in the world scene and latent feature locations.  ...  We propose geometry-aware recurrent networks and loss functions for active object detection, segmentation and 3D reconstruction in cluttered scenes, following the old active vision premise.  ... 
arXiv:1811.01292v2 fatcat:3caf2xhu7ngyzbvh5jwdqw2ooe

Recurrent 3D Attentional Networks for End-to-End Active Object Recognition [article]

Min Liu, Yifei Shi, Lintao Zheng, Kai Xu, Hui Huang, Dinesh Manocha
2022 arXiv   pre-print
, through developing an end-to-end recurrent 3D attentional network.  ...  Inspired by the recent success of attention-based models in 2D vision tasks based on single RGB images, we propose to address the multi-view depth-based active object recognition using attention mechanism  ...  Acknowledgements We thank the anonymous reviewers for their valuable comments. This work was supported, in parts, by an NSFC programs (61572507, 61622212, 61532003).  ... 
arXiv:1610.04308v4 fatcat:cxoqgqes6nej7jfykhcxiip5lm

3D attention-driven depth acquisition for object identification

Kai Xu, Yifei Shi, Lintao Zheng, Junyu Zhang, Min Liu, Hui Huang, Hao Su, Daniel Cohen-Or, Baoquan Chen
2016 ACM Transactions on Graphics  
Inspired by the recent success of attention-based models for 2D recognition, we develop a 3D Attention Model that selects the best views to scan from, as well as the most informative regions in each view  ...  The attention model, trained with the 3D shape collection, encodes the temporal dependencies among consecutive views with deep recurrent networks.  ...  Acknowledgements We thank the anonymous reviewers for their valuable comments and suggestions.  ... 
doi:10.1145/2980179.2980224 fatcat:wrvfjdz54bdnfmh5mui3itjnt4

Recurrent 3D attentional networks for end-to-end active object recognition

Min Liu, Yifei Shi, Lintao Zheng, Kai Xu, Hui Huang, Dinesh Manocha
2019 Computational Visual Media  
of an end-toend recurrent 3D attentional network.  ...  Active vision is inherently attention-driven: an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed  ...  Acknowledgements We thank the anonymous reviewers for their valuable comments.  ... 
doi:10.1007/s41095-019-0135-2 fatcat:yyqgi6ts5vh7nj7wvnlv4cithi

ASM-3D: An attentional search model fashioned after what and where/how pathways for target search in 3D environment [article]

Sweta Kumari, Shobha Amala V Y, Nivethithan M, V. Srinivasa Chakravarthy
2022 bioRxiv   pre-print
We propose a biologically inspired attention model for target search in a 3D environment, which has two separate channels for object classification, analogous to the what pathway in the human visual system  ...  We generated a 3D Cluttered Cube dataset that consists of an image on one vertical face, and clutter images on the other faces.  ...  ACKNOWLEDGMENT We acknowledge the support from Pavan Holla and Vigneswaran for the implementation of flip-flop neurons. We also acknowledge Sowmya Manojna to generate RGB MNIST dataset.  ... 
doi:10.1101/2022.08.01.502278 fatcat:bxqh434kbrh7vps3k2orrbjfny

H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions

Bugra Tekin, Federica Bogo, Marc Pollefeys
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
We present a unified framework for understanding 3D hand and object interactions in raw image sequences from egocentric RGB cameras.  ...  network.  ...  network: 3D hand pose estimation, object pose estimation, object recognition and activity classification.  ... 
doi:10.1109/cvpr.2019.00464 dblp:conf/cvpr/TekinBP19 fatcat:xz7iop75wvdybmqbtyaprvhwru

PREMA: Part-based REcurrent Multi-view Aggregation Network for 3D Shape Retrieval [article]

Jiongchao Jin, Huanqiang Xu, Pengliang Ji, Zehao Tang, Zhang Xiong
2021 arXiv   pre-print
Comprehensively, we design a novel Regional Attention Unit(RAU) in PREMA to compute the confidence map for each view, and extract MCPs by applying those maps to view features.  ...  We propose the Part-based Recurrent Multi-view Aggregation network(PREMA) to eliminate the detrimental effects of the practical view defects, such as insufficient view numbers, occlusions or background  ...  This amounts to a sequential modeling of part-based recognition using recurrent neural networks.  ... 
arXiv:2111.04945v1 fatcat:mlwo3jknkbhnbp2suea35vpvpa

A Review on Action Recognition and Action Prediction of Human(s) using Deep Learning Approaches

Syed Abdussami, Nagendraprasad S., Shivarajakumara K., Sanjeet Singh, A. Thyagarajamurthy
2019 International Journal of Computer Applications  
In the second research paper, the glimpse sequences in each frame correspond to interest points in the scene that are relevant to the classified activities.  ...  Compared to their counterparts for still images (the 2D CNNs for visual recognition), the 3D CNNs are considered to be comparatively less efficient, due to the limitations like high training complexity  ...  The first paper referred presents a deep architecture model to address the problems of 3D CNNs used for action recognition in videos and improvement in the performance of 3D CNNs for action recognition  ... 
doi:10.5120/ijca2019919605 fatcat:pww4cnfysffavg7f27vq5f25ra

Recurrent Scene Parsing with Perspective Understanding in the Loop

Shu Kong, Charless Fowlkes
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution.  ...  We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that  ...  in our recurrent model near object boundaries and in cluttered regions with many small objects.  ... 
doi:10.1109/cvpr.2018.00106 dblp:conf/cvpr/KongF18 fatcat:fnsxtabhvvg7fcnfwwadw5jd5a

LCD – Line Clustering and Description for Place Recognition [article]

Felix Taubner, Florian Tschopp, Tonci Novkovic, Roland Siegwart, Fadri Furrer
2020 arXiv   pre-print
In our work, line clusters are defined as lines that make up individual objects, hence our place recognition approach can be understood as object recognition. 3D line segments are detected in RGB-D images  ...  We present a neural network architecture based on the attention mechanism for frame-wise line clustering.  ...  Triplet loss We use the triplet loss as described in [57] for the object recognition learning task.  ... 
arXiv:2010.10867v1 fatcat:axamt2ufl5fstmmcwh7lmtqd44

Action Classification and Highlighting in Videos [article]

Atousa Torabi, Leonid Sigal
2017 arXiv   pre-print
Inspired by recent advances in neural machine translation, that jointly align and translate using encoder-decoder networks equipped with attention, we propose an attentionbased LSTM model for human activity  ...  We qualitatively show that soft-attention can learn to effectively attend to important objects and scene information correlated with specific human actions.  ...  -19 network fine-tuned on ActivityNet data for action recognition of 203 action classes; • V GG sce : VGG-19 network fine-tuned on MIT-Scenes for scene recognition of 205 scene classes [53] .  ... 
arXiv:1708.09522v1 fatcat:4kmeonpinnbdtkula2pt4e6ufu

Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural Network

Le Wang, Jinliang Zang, Qilin Zhang, Zhenxing Niu, Gang Hua, Nanning Zheng
2018 Sensors  
Motivated by the popular recurrent attention models in the research area of natural language processing, we propose the Attention-aware Temporal Weighted CNN (ATW CNN) for action recognition in videos,  ...  Besides, each stream in the proposed ATW CNN framework is capable of end-to-end training, with both network parameters and temporal weights optimized by stochastic gradient descent (SGD) with back-propagation  ...  Therefore, it is difficult for the early 3D convolution neural networks (3D CNNs) [18] to achieve action recognition performance on par with the sophisticated hand-crafted improved Dense Trajectory (  ... 
doi:10.3390/s18071979 pmid:29933555 pmcid:PMC6069475 fatcat:byyotu7o75amzpbtifmpkpyunm

Two-stream Flow-guided Convolutional Attention Networks for Action Recognition [article]

An Tran, Loong-Fah Cheong
2017 arXiv   pre-print
This paper proposes a two-stream flow-guided convolutional attention networks for action recognition in videos.  ...  These cross-link layers guide the spatial-stream to pay more attention to the human foreground areas and be less affected by background clutter.  ...  The attention in our approach is modeled simply, but it shows good performances compared to the recurrent attention models for action recognition.  ... 
arXiv:1708.09268v1 fatcat:vhudw2u5xbh7lbtk6qtofpztre

Recent Advances in Vision-Based On-Road Behaviors Understanding: A Critical Survey

Rim Trabelsi, Redouane Khemmar, Benoit Decoux, Jean-Yves Ertaud, Rémi Butteau
2022 Sensors  
For this, five related topics have been covered in this review, including situational awareness, driver-road interaction, road scene understanding, trajectories forecast, driving activities, and status  ...  Several endeavors have been proposed to deal with different related tasks and it has gained wide attention recently.  ...  [32] proposes an encoder-decoder architecture that provides an end-to-end road scene understanding. The encoder is a CNN-based network similar to VGGNet [21] .  ... 
doi:10.3390/s22072654 pmid:35408269 pmcid:PMC9003377 fatcat:2vrmgz3b25eyxbijeurx5aijv4

Invariant visual object recognition: A model, with lighting invariance

Edmund T. Rolls, Simon M. Stringer
2006 Journal of Physiology - Paris  
The model has also been extended to account for how the visual system can select single objects in complex visual scenes, and how multiple objects can be represented in a scene.  ...  The model has been extended to incorporate topdown feedback connections to model the control of attention by biased competition in for example spatial and object search tasks.  ...  For example, showed that when macaques used object-based attention to search for one of two objects to touch in a complex natural scene between 99% and 94% of the information was present in the firing  ... 
doi:10.1016/j.jphysparis.2006.09.004 pmid:17071062 fatcat:bipw34ihvbf7nc6edwtrlqfkfi
« Previous Showing results 1 — 15 out of 940 results