Filters








1,099 Hits in 5.3 sec

Review of Visual Saliency Detection with Comprehensive Information [article]

Runmin Cong, Jianjun Lei, Huazhu Fu, Ming-Ming Cheng, Weisi Lin, and Qingming Huang
2018 arXiv   pre-print
Co-saliency detection model introduces the inter-image correspondence constraint to discover the common salient object in an image group.  ...  With the acquisition technology development, more comprehensive information, such as depth cue, inter-image correspondence, or temporal relationship, is available to extend image saliency detection to  ...  low-rank component corresponds to the background, and the sparse proportion represents the moving foreground object.  ... 
arXiv:1803.03391v2 fatcat:htcmhlo32jhczehvvq6nmgzwam

Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization

Jun-Tae Lee, Mihir Jain, Hyoungwoo Park, Sungrack Yun
2021 International Conference on Learning Representations  
First, we propose a multi-stage cross-attention mechanism to collaboratively fuse audio and visual features, which preserves the intra-modal characteristics.  ...  Third, for precise action localization, we design consistency losses to enforce temporal continuity for the actionclass prediction, and also help with foreground-prediction reliability.  ...  Our model learns to classify video snippets via two consistency losses that enforce continuity for foreground reliability and open-max probabilities for action classes and the background.  ... 
dblp:conf/iclr/LeeJPY21 fatcat:4tjobcmt55aejjk3ey2tq4qivy

A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges [article]

Dingwen Zhang, Huazhu Fu, Junwei Han, Ali Borji, Xuelong Li
2017 arXiv   pre-print
As a novel branch of visual saliency, co-saliency detection refers to the discovery of common and salient foregrounds from two or more relevant images, and can be widely used in many computer vision tasks  ...  The existing co-saliency detection algorithms mainly consist of three components: extracting effective features to represent the image regions, exploring the informative cues or factors to characterize  ...  Intra-image contrast vs. Inter-image consistency Intra-image contrast and inter-image consistency are the two most critical properties of co-salient object regions.  ... 
arXiv:1604.07090v5 fatcat:j7zqwqaowndrbcuoazzizkuqr4

A Fusion-Based Framework for Wireless Multimedia Sensor Networks in Surveillance Applications

Adnan Yazici, Murat Koyuncu, Seyyit Alper Sert, Turgay Yilma
2019 IEEE Access  
In order to reduce the amount of information to be transmitted to the base station and thereby extend the lifetime of a WMSN, a method for detecting and classifying objects on three different layers has  ...  As part of this motivation, this paper proposes a fusion-based WMSN framework that reduces the amount of data to be transmitted over the network by intra-node processing.  ...  Cihan Kucukkececi for their very valuable supports and contributions for this study.  ... 
doi:10.1109/access.2019.2926206 fatcat:w4khilhftbgcndqbczyavihuwq

A Survey on Video Moment Localization

Meng Liu, Liqiang Nie, Yunxiao Wang, Meng Wang, Yong Rui
2022 ACM Computing Surveys  
We also review the datasets available for video moment localization and group results of related work.  ...  In addition, we discuss promising future directions for this field, in particular large-scale datasets and interpretable video moment localization models.  ...  knowledge of inter-and intra-modal relations.  ... 
doi:10.1145/3556537 fatcat:3s6cqyebnjfg3pvwvk3db7d6ra

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection [article]

Yanan Zhang, Jiaxin Chen, Di Huang
2022 arXiv   pre-print
PT, IT and CMT jointly encode intra-modal and inter-modal long-range contexts for representing an object, thus fully exploring multi-modal information for detection.  ...  Furthermore, we propose an effective One-way Multi-modal Data Augmentation (OMDA) approach via hierarchical contrastive learning at both the point and object levels, significantly improving the accuracy  ...  The integration of PT, IT, and CMT fully encodes intra-modal and inter-modal long-range dependencies as a powerful representation, thus benefiting detection performance.  ... 
arXiv:2204.00325v2 fatcat:cijmrrtrtjhnjl5ofjn3yhullm

Cross-Modal Pattern-Propagation for RGB-T Tracking

Chaoqun Wang, Chunyan Xu, Zhen Cui, Ling Zhou, Tong Zhang, Xiaoya Zhang, Jian Yang
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
To bridge RGB-T modalities, the cross-modal correlations on intra-modal paired pattern-affinities are derived to reveal those latent cues between heterogenous modalities.  ...  Through the correlations, the useful patterns may be mutually propagated between RGB-T modalities so as to fulfill inter-modal pattern-propagation.  ...  Acknowledgement This work was supported by the National Natural Science Foundation of China (Grants Nos. 61972204, 61772276, 61906094, U1713208) and the Natural Science Foundation of Jiangsu Province (  ... 
doi:10.1109/cvpr42600.2020.00709 dblp:conf/cvpr/WangXCZZZY20 fatcat:scouanz5pvdu7oldjg43c7j2me

Blind Audio-Visual Localization and Separation via Low-Rank and Sparsity

Jie Pu, Yannis Panagakis, Stavros Petridis, Jie Shen, Maja Pantic
2019 IEEE Transactions on Cybernetics  
To this end, we devise a novel structured matrix decomposition method that decomposes the data matrix of each modality as a superposition of three terms: 1) a low-rank matrix capturing the background information  ...  detection.  ...  ACKNOWLEDGMENT This work has been funded by the European Community Horizon 2020 under grant agreement no. 645094 (SEWA) and no. 688835 (DE-ENIGMA).  ... 
doi:10.1109/tcyb.2018.2883607 pmid:30561363 fatcat:cnbr3naxbjerjezl4xegxsbxzm

Informedia @ TRECVID 2018: Ad-hoc Video Search, Video to Text Description, Activities in Extended video

Jia Chen, Shizhe Chen, Qin Jin, Alexander G. Hauptmann, Po-Yao Huang, Junwei Liang, Vaibhav, Xiaojun Chang, Jiang Liu, Ting-Yao Hu, Wenhe Liu, Wei Ke (+7 others)
2018 TREC Video Retrieval Evaluation  
However, there are two main limitations of the most widely used cross-entropy (CE) function as the training target, namely exposure bias and mismatched targets in training and testing.  ...  We build up our models with multi-hop intra-and inter-modal attention and learn the joint embedding on multiple retrieval datasets such as Flickr30K, MS-COCO, MS-VTT where image/video-text pairs are available  ...  Intra-modal Attention Network We utilize an intra-modal attention network similar to dual attention network (DAN) [13] , which is a multi-hop, intra-modal attention model, to train joint embeddings for  ... 
dblp:conf/trecvid/ChenCJH00VCLHLK18 fatcat:4hie3xjj65gwdeoe7odbddrmrq

2020 Index IEEE Transactions on Circuits and Systems for Video Technology Vol. 30

2020 IEEE transactions on circuits and systems for video technology (Print)  
., TCSVT Jan. 2020 217-231 Hu, X., see Zhu, L., TCSVT Oct. 2020 3358-3371 Hu, Y., Lu, M., Xie, C., and Lu, X  ...  ., and Zeng, B., MUcast: Linear Uncoded Multiuser TCSVT Nov. 2020 4299-4308 Hu, R., see Chen, L., TCSVT Dec. 2020 4513-4525 Hu, R., see Wang, X., TCSVT Nov. 2020 4309-4320 Hu, X., see Zhang, X  ...  ., +, TCSVT Dec. 2020 4467-4480 Video Dialog via Multi-Grained Convolutional Self-Attention Context Multi-Modal Networks.  ... 
doi:10.1109/tcsvt.2020.3043861 fatcat:s6z4wzp45vfflphgfcxh6x7npu

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

Xiaoze Jiang, Jing Yu, Zengchang Qin, Yingying Zhuang, Xingxing Zhang, Yue Hu, Qi Wu
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values.  ...  Futhermore, on top of such multi-view image features, we propose a feature selection framework which is able to adaptively capture question-relevant information hierarchically in fine-grained level.  ...  Acknowledgement This work is supported by the National Key Research and Development Program (Grant No.2017YFB0803301).  ... 
doi:10.1609/aaai.v34i07.6769 fatcat:pxyktd6kq5gwlg2e36kfyo35my

Learning Synergistic Attention for Light Field Salient Object Detection [article]

Yi Zhang, Geng Chen, Qian Chen, Yujia Sun, Yong Xia, Olivier Deforges, Wassim Hamidouche, Lu Zhang
2021 arXiv   pre-print
We propose a novel Synergistic Attention Network (SA-Net) to address the light field salient object detection by establishing a synergistic effect between multi-modal features with advanced attention mechanisms  ...  Our SA-Net exploits the rich information of focal stacks via 3D convolutional neural networks, decodes the high-level features of multi-modal light field data with two cascaded synergistic attention modules  ...  We encode the FS with a stack of 3D convolutional blocks, which are able to jointly capture the rich intra-and inter-slice information for accurate SOD.  ... 
arXiv:2104.13916v4 fatcat:ulvi3gttbrgcxddbstqacdp5oq

Sparse Camera Network for Visual Surveillance -- A Comprehensive Survey [article]

Mingli Song, Dachent Tao, Stephen J. Maybank
2013 arXiv   pre-print
The analysis of visual cues in multi-camera networks enables a wide range of applications, from smart home and office automation to large area surveillance and traffic surveillance.  ...  In this review paper, we present a comprehensive survey of recent research results to address the problems of intra-camera tracking, topological structure learning, target appearance modeling, and global  ...  In contrast with features based on vision alone, multi-modalities can provide more information.  ... 
arXiv:1302.0446v1 fatcat:j3opbzuaw5eb3c74c2ind5pmua

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue [article]

Xiaoze Jiang, Jing Yu, Zengchang Qin, Yingying Zhuang, Xingxing Zhang, Yue Hu, Qi Wu
2019 arXiv   pre-print
More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values.  ...  Futhermore, on top of such multi-view image features, we propose a feature selection framework which is able to adaptively capture question-relevant information hierarchically in fine-grained level.  ...  Acknowledgement This work is supported by the National Key Research and Development Program (Grant No.2017YFB0803301).  ... 
arXiv:1911.07251v1 fatcat:of6xfs5ofndtff4fsaezzzzuwi

Research on Salient Object Detection using Deep Learning and Segmentation Methods

2019 International journal of recent technology and engineering  
Detecting and segmenting salient objects in natural scenes, often referred to as salient object detection has attracted a lot of interest in computer vision and recently various heuristic computational  ...  It not only focuses on the methods to detect saliency objects, but also reviews the works related to spatio temporal video attention detection technique in video sequences.  ...  ., [22] proposed video co saliency approach accounts for both inter-video foreground correspondences and intra-video saliency stimuli to emphasize the salient foreground regions of video frames and,  ... 
doi:10.35940/ijrte.b1046.0982s1119 fatcat:6ofq53vb7zhx7boq4ndpraphs4
« Previous Showing results 1 — 15 out of 1,099 results