Filters








19,126 Hits in 6.0 sec

Sample-Specific Late Fusion for Visual Category Recognition

Dong Liu, Kuan-Ting Lai, Guangnan Ye, Ming-Syan Chen, Shih-Fu Chang
2013 2013 IEEE Conference on Computer Vision and Pattern Recognition  
In this paper, we propose a sample-specific late fusion method to address this issue.  ...  Extensive experiment results on various visual categorization tasks show that the proposed method consistently and significantly beats the state-of-the-art late fusion methods.  ...  For future work, we will pursue the sample-specific late fusion for multi-class and multi-label visual recognition tasks.  ... 
doi:10.1109/cvpr.2013.109 dblp:conf/cvpr/LiuLYCC13 fatcat:qpe3i62pavc5tfbxx25dc6kah4

SUPER

Yu-Gang Jiang
2012 Proceedings of the 2nd ACM International Conference on Multimedia Retrieval - ICMR '12  
We also observe that, different from the visual channel, the soundtracks contains little redundant information for video event recognition.  ...  In addition, we also evaluate how many visual and audio frames are needed for event recognition in Internet videos, a question left unanswered in the literature.  ...  For multimodal fusion, we observed that early/kernel fusion offers higher recognition accuracy than the popularly adopted late fusion.  ... 
doi:10.1145/2324796.2324805 dblp:conf/mir/Jiang12 fatcat:o3fwiusmxng6rfbmbmbwehmcdm

Modulating Shape Features by Color Attention for Object Recognition

Fahad Shahbaz Khan, Joost van de Weijer, Maria Vanrell
2011 International Journal of Computer Vision  
Bag-of-words based image representation is a successful approach for object recognition.  ...  Note that for some classes early fusion scheme performs better where as for some categories, late fusion outperforms early fusion methods.  ...  Consequently, a class-specific image histogram is constructed for each category. Experiments are conducted on standard object recognition data sets.  ... 
doi:10.1007/s11263-011-0495-2 fatcat:htdqd6valzcyrhmgqtxjm7uyym

Learning Sample Specific Weights for Late Fusion

Kuan-Ting Lai, Dong Liu, Shih-Fu Chang, Ming-Syan Chen
2015 IEEE Transactions on Image Processing  
To the best of our knowledge, this is the first method that supports learning of sample specific fusion weights for late fusion.  ...  In order to address this issue, we propose a novel sample specific late fusion (SSLF) method.  ...  For future work, we will pursuit the sample specific late fusion for multi-class and multi-label visual recognition tasks. Fig. 1 . 1 Illustration of the Sample Specific Late Fusion (SSLF) method.  ... 
doi:10.1109/tip.2015.2423560 pmid:25879948 fatcat:pmbu7epunvbtvh24iqtexlcbxq

Top-down color attention for object recognition

Fahad Shahbaz Khan, Joost van de Weijer, Maria Vanrell
2009 2009 IEEE 12th International Conference on Computer Vision  
This procedure leads to a category-specific image histogram representation for each category. Furthermore, we argue that the method combines the advantages of both early and late fusion.  ...  Color is used to guide attention by means of a top-down category-specific attention map.  ...  As a result a class-specific image histogram is constructed for each category.  ... 
doi:10.1109/iccv.2009.5459362 dblp:conf/iccv/KhanWV09 fatcat:u747ztefbjaotcakof74zs4e3u

Top-Down Deep Appearance Attention for Action Recognition [chapter]

Rao Muhammad Anwer, Fahad Shahbaz Khan, Joost van de Weijer, Jorma Laaksonen
2017 Lecture Notes in Computer Science  
The results clearly demonstrate that our approach outperforms both standard approaches of early and late feature fusion.  ...  A category-specific appearance map is then learned to modulate the weights of the deep motion features.  ...  the project SymbiCloud, VR starting grant (2016-05543), through the Strategic Area for ICT research ELLIIT.  ... 
doi:10.1007/978-3-319-59126-1_25 fatcat:fcz7jxqy2vgpbbq5xh7eq3pn7a

Super Fast Event Recognition in Internet Videos

Yu-Gang Jiang, Qi Dai, Tao Mei, Yong Rui, Shih-Fu Chang
2015 IEEE transactions on multimedia  
We also find that, different from the visual frames, the soundtracks contain little redundant information and thus sampling is always harmful.  ...  In addition, we also provide a study on the following interesting question: for event recognition in Internet videos, what is the minimum number of visual and audio frames needed to obtain a comparable  ...  For classification and fusion, we use the most popular option of SVM classifier and late fusion.  ... 
doi:10.1109/tmm.2015.2436813 fatcat:t2plqlfhhjcx7aoat7h3da7jna

Viewpoint invariant semantic object and scene categorization with RGB-D sensors

Hasan F. M. Zaki, Faisal Shafait, Ajmal Mian
2018 Autonomous Robots  
We also present a technique to fuse the proposed HP-CNN with the activations of fully connected neurons based on an Extreme Learning Machine classifier in a late fusion scheme which leads to a highly discriminative  ...  Extensive evaluations on four RGB-D object and scene recognition datasets demonstrate that our HP-CNN and HP-CNN-T consistently outperforms state-of-the-art methods for several recognition tasks by a significant  ...  Our method also outperforms other methods for channel-specific category recognition.  ... 
doi:10.1007/s10514-018-9776-8 fatcat:spg6rdzgbrhpxpbuedqr2th2pe

On Learning Semantic Representations for Million-Scale Free-Hand Sketches [article]

Peng Xu, Yongye Huang, Tongtong Yuan, Tao Xiang, Timothy M. Hospedales, Yi-Zhe Song, Liang Wang
2020 arXiv   pre-print
Specifically, we use our dual-branch architecture as a universal representation framework to design two sketch-specific deep models: (i) We propose a deep hashing model for sketch retrieval, where a novel  ...  (ii) We propose a deep embedding model for sketch zero-shot recognition, via collecting a large-scale edge-map dataset and proposing to extract a set of semantic vectors from edge-maps as the semantic  ...  This late-fusion layer will provide representations for various tasks. B.  ... 
arXiv:2007.04101v1 fatcat:cng2cw6r5fg43p5erfisj57tu4

Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification

Rao Muhammad Anwer, Fahad Shahbaz Khan, Joost van de Weijer, Matthieu Molinier, Jorma Laaksonen
2018 ISPRS journal of photogrammetry and remote sensing (Print)  
Our late fusion TEX-Net architecture always improves the overall performance compared to the standard RGB network on both recognition problems.  ...  To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification  ...  We further evaluate two fusion strategies, early and late fusion, to combine RGB and texture streams for texture recognition and remote sensing scene classification.  ... 
doi:10.1016/j.isprsjprs.2018.01.023 fatcat:mcgkgf23zfeu7obtoijx3t5s5e

Learning Spatiotemporal Features for Infrared Action Recognition with 3D Convolutional Neural Networks [article]

Zhuolin Jiang, Viktor Rozgic, Sancar Adali
2017 arXiv   pre-print
Our objective is to exploit imaging data in this modality for the action recognition task.  ...  We conduct an elaborate analysis of different fusion schemes (weighted average, single and double-layer neural nets) applied to different 3D CNN outputs.  ...  Late fusion 1 74 Late fusion 2 77.5 Single-layer NN fusion 71.25 Two-layer NN fusion 70.42 Table 3.  ... 
arXiv:1705.06709v1 fatcat:oqyk3pvwvbbh3cs7gurio76lxa

Multimodal analysis of user behavior and browsed content under different image search intents

Mohammad Soleymani, Michael Riegler, Pål Halvorsen
2018 International Journal of Multimedia Information Retrieval  
We thank David Sander for his kind assistance for the ethical review of the experiment.  ...  We also thank "Fondation Campus Biotech Genève" for providing access and support at their experimental facilities.  ...  Still, late fusion reaches the best results for both approaches. In our experiments, we used the weighted sum of scores for late fusion for both methods.  ... 
doi:10.1007/s13735-018-0150-6 fatcat:qtjr4e6cundopa6sog2lihiq2q

Seeing and Hearing Egocentric Actions: How Much Can We Learn? [article]

Alejandro Cartas and Jordi Luque and Petia Radeva and Carlos Segura and Mariella Dimiccoli
2019 arXiv   pre-print
Our model combines a sparse temporal sampling strategy with a late fusion of audio, spatial, and temporal streams.  ...  In this work, we propose a multimodal approach for egocentric action recognition in a kitchen environment that relies on audio and visual information.  ...  The three main model-agnostic approaches largely used in the literature for combining the audio-visual features are early, late, and hybrid fusion [4, 28] .  ... 
arXiv:1910.06693v1 fatcat:tmofzojtuvh23jdjwx4hvtgkbi

Seeing and Hearing Egocentric Actions: How Much Can We Learn?

Alejandro Cartas, Jordi Luque, Petia Radeva, Carlos Segura, Mariella Dimiccoli
2019 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)  
Our model combines a sparse temporal sampling strategy with a late fusion of audio, spatial, and temporal streams.  ...  In this work, we propose a multimodal approach for egocentric action recognition in a kitchen environment that relies on audio and visual information.  ...  The three main model-agnostic approaches largely used in the literature for combining the audio-visual features are early, late, and hybrid fusion [4, 28] .  ... 
doi:10.1109/iccvw.2019.00548 dblp:conf/iccvw/CartasLRSD19 fatcat:5nriiccjqzeezkfqmovohehgee

Multi-Modal Residual Perceptron Network for Audio–Video Emotion Recognition

Xin Chang, Władysław Skarbek
2021 Sensors  
impedes better performance in the existing late fusion and end-to-end multi-modal network training strategies.  ...  Audio–Visual Database of Emotional Speech and Song dataset and to 83.15% for the Crowd-Sourced Emotional Multi Modal Actors dataset.  ...  Performance on some specific categories shows a slight decrease for MRPN, especially for the categories of calm and neutral expressions because they are naturally close to each other in the RAVDESS dataset  ... 
doi:10.3390/s21165452 pmid:34450894 pmcid:PMC8399720 fatcat:bplgcb3lgbbkhl4b2tfeg25k5a
« Previous Showing results 1 — 15 out of 19,126 results