23 Hits in 5.4 sec

Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning [article]

Andrew Shin, Masataka Yamaguchi, Katsunori Ohnishi, Tatsuya Harada
2016 arXiv   pre-print
We propose to incorporate coding with vector of locally aggregated descriptors (VLAD) on spatial pyramid for CNN features of sub-regions in order to generate image representations that better reflect the  ...  Our results show that our method of compact VLAD coding can match CNN features with as little as 3% of dimensionality and, when combined with spatial pyramid, it results in image captions that more accurately  ...  Conclusion We introduced a novel method for image representation incorporating spatial pyramid VLAD to CNN features of sub-regions suggested by selective search, in order to generate more locally robust  ... 
arXiv:1603.09046v1 fatcat:sasstlz7jfcpdgfheasgelos2y

Multiple VLAD encoding of CNNs for image classification [article]

Qing Li, Qiang Peng, Chuan Yan
2017 arXiv   pre-print
Finally, we equip the spatial pyramid patch (SPM) on VLAD encoding to add the spatial information of CNNs feature.  ...  In this paper, we propose a special framework, which is the multiple VLAD encoding method with the CNNs features for image classification.  ...  ACKNOWLEDGMENTS This work is supported by the Fundamental Research Funds for Central Universities(No.2062015YXZT11).  ... 
arXiv:1707.00058v1 fatcat:dzak3muak5eenif3xwgwehhy6q

2020 Index IEEE Transactions on Multimedia Vol. 22

2020 IEEE transactions on multimedia  
., +, TMM Oct. 2020 2698-2710 Exploring Discriminative Representations for Image Emotion Recognition With CNNs.  ...  ., +, TMM Nov. 2020 2914-2925 Exploring Discriminative Representations for Image Emotion Recognition With CNNs.  ...  Image watermarking Blind Watermarking for 3-D Printed Objects by Locally Modifying Layer Thickness. 2780 -2791 Low-Light Image Enhancement With Semi-Decoupled Decomposition.  ... 
doi:10.1109/tmm.2020.3047236 fatcat:llha6qbaandfvkhrzpe5gek6mq

Deep Discriminative Representation Learning with Attention Map for Scene Classification [article]

Jun Li, Daoyu Lin, Yang Wang, Guangluan Xu, Chibiao Ding
2019 arXiv   pre-print
Learning powerful discriminative features for remote sensing image scene classification is a challenging computer vision problem.  ...  The de facto practice when learning these CNN models is only to use original RGB patches as input with training performed on large amounts of labeled data (ImageNet).  ...  Firstly, they extracted the dense scale-invariant feature transformation features from remote sensing image. And then used spatial pyramid maximum pooling with sparse coding to encode the features.  ... 
arXiv:1902.07967v1 fatcat:cquansbwubbklnskpyb3uejbiu

DCT Inspired Feature Transform for Image Retrieval and Reconstruction

Yunhe Wang, Miaojing Shi, Shan You, Chao Xu
2016 IEEE Transactions on Image Processing  
We test the accuracy and robustness of DIFT on real image matching.  ...  Scale invariant feature transform (SIFT) is effective for representing images in computer vision tasks, as one of the most resistant feature descriptions to common image deformations.  ...  BOW employs local features, i.e., SIFT and DIFT for image representation; VLAD aggregates local features into global representation. We compare DIFT and SIFT on both models. B.  ... 
doi:10.1109/tip.2016.2590323 pmid:27416596 fatcat:xcbmng5arfaq7lsq4wwu4eorem

[Invited Paper] Semantic Indexing for Large-Scale Video Retrieval

Nakamasa Inoue, Koichi Shinoda
2016 ITE Transactions on Media Technology and Applications  
They include extensions of deep learning techniques and image recognition techniques such as bag of visual words to video data.  ...  This paper reviews TRECVID activities with these techniques for semantic indexing.  ...  Recent works such as Regions with CNN (R-CNN) 85) and spatial pyramid pooling for CNN 86) 87) apply selective search to neural-network based frameworks.  ... 
doi:10.3169/mta.4.209 fatcat:pewjuvxonfgxnm5onv5e5xyvlu

ActivityNet Challenge 2017 Summary [article]

Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Ranjay Khrisna, Victor Escorcia, Kenji Hata, Shyamal Buch
2017 arXiv   pre-print
We would like to thank the authors of the Kinetics dataset for their kind support; and Joao Carreira and Brian Zhang for helpful discussions.  ...  Moreover, by Experiment Results Dense-Captioning Events in Videos System The main goal of dense-captioning events in videos is jointly localizes temporal proposals of interest in videos and then  ...  To obtain robust representation, large number of data and effective learning strategies are required.  ... 
arXiv:1710.08011v1 fatcat:bc5qhp2cungrdj4j3lebxeoane

Going Deeper into Action Recognition: A Survey [article]

Samitha Herath, Mehrtash Harandi, Fatih Porikli
2017 arXiv   pre-print
To this end, we start our discussion with the pioneering methods that use handcrafted representations, and then, navigate into the realm of deep learning based approaches.  ...  for the reader.  ...  Basura Fernando for fruitful discussions and encouragement comments given for this work.  ... 
arXiv:1605.04988v2 fatcat:7727tjctgfffzlnig5rvicxjgq

Deep Learning for Action and Gesture Recognition in Image Sequences: A Survey [chapter]

Maryam Asadi-Aghbolaghi, Albert Clapés, Marco Bellantonio, Hugo Jair Escalante, Víctor Ponce-López, Xavier Baró, Isabelle Guyon, Shohreh Kasaei, Sergio Escalera
2017 Gesture Recognition  
This chapter is a survey of current deep learning based methodologies for action and gesture recognition in sequences of images.  ...  A survey on deep learning based approaches for action and gesture recognition in image sequences.  ...  Wang et al. (2017) use three representations of dynamic depth image (DDI), dynamic depth normal image (DDNI) and dynamic depth motion normal image (DDMNI) as the input data of 2D networks for gesture  ... 
doi:10.1007/978-3-319-57021-1_19 fatcat:d2m5oyomsjhkbfpunhefho6ayq

Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources

Xiao Xiang Zhu, Devis Tuia, Lichao Mou, Gui-Song Xia, Liangpei Zhang, Feng Xu, Friedrich Fraundorfer
2017 IEEE Geoscience and Remote Sensing Magazine  
simple to start with.  ...  In this article, we analyze the challenges of using deep learning for remote sensing data analysis, review the recent advances, and provide resources to make deep learning in remote sensing ridiculously  ...  the CNN as a local feature extractor and combines it with feature coding techniques, such as BoVW [78] and vector of locally aggregated descriptors (VLAD), to generate the final image representation  ... 
doi:10.1109/mgrs.2017.2762307 fatcat:ec7b32lpdnhvzbdz2uoayw6anq

End-to-end Learning of Deep Visual Representations for Image Retrieval [article]

Albert Gordo and Jon Almazan and Jerome Revaud and Diane Larlus
2017 arXiv   pre-print
At the end of the training process, the proposed architecture produces a global image representation in a single forward pass that is well suited for image retrieval.  ...  Our representations can also be heavily compressed using product quantization with little loss in accuracy. For additional material, please see  ...  Even for very short image codes of Results for short image codes.  ... 
arXiv:1610.07940v2 fatcat:ecuxhrf6bffjfawbvoo7lgnmny

AI-Empowered Persuasive Video Generation: A Survey [article]

Chang Liu, Han Yu
2021 arXiv   pre-print
music generation and still image animation to enhance viewing experience.  ...  This field is interdisciplinary in nature, which makes it challenging for new researchers to grasp. Currently, there is no comprehensive survey of AIPVG available.  ...  One of the RNNs takes the spatial pyramid pooling on the histogram of dense optical flow (SPP-HOOF) as the input. The other RNN takes the C3D feature of current video clips as the input.  ... 
arXiv:2112.09401v1 fatcat:t5bsqo6shbcoleryphaawewevy

Instance search retrospective with focus on TRECVID

George Awad, Wessel Kraaij, Paul Over, Shin'ichi Satoh
2017 International Journal of Multimedia Information Retrieval  
The Instance Search (INS) benchmark worked with a variety of large collections of data including Sound & Vision, Flickr, BBC (British Broadcasting Corporation) Rushes for the first 3 pilot years and with  ...  the small world of the BBC Eastenders series for the last 3 years.  ...  feature types (local/global) and fusion of CNN, SIFT BOW (Bag Of Words) and text captions.  ... 
doi:10.1007/s13735-017-0121-3 pmid:28758054 pmcid:PMC5531298 fatcat:3khp2cscmbhohipfx246gspqlq

Region-Based Convolutional Networks for Accurate Object Detection and Segmentation

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik
2016 IEEE Transactions on Pattern Analysis and Machine Intelligence  
Since we combine region proposals with CNNs, we call the resulting model an R-CNN or Region-based Convolutional Network. Source code for the complete system is available at  ...  The best-performing methods were complex ensemble systems that typically combined multiple low-level image features with high-level context.  ...  To classify regions, their method builds a four-level spatial pyramid and populates it with densely sampled SIFT, Extended OpponentSIFT, and RGB-SIFT descriptors, each vector quantized with 4000-word codebooks  ... 
doi:10.1109/tpami.2015.2437384 pmid:26656583 fatcat:ptlmv4awoner5iu23xmv6bonpa

Deep Learning Methods for Human Behavior Recognition

Jia Lu, Minh Nguyen, Wei Qi Yan
2020 2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)  
In this thesis, we explore and exploit the state-of-the-art methods, which are utilized for human behavior recognition.  ...  2) The YOLOv3 + LSTM network to reply on both spatiotemporal information with class score fusion is able to achieve 97.58% accuracy based on our dataset for sign language processing.  ...  The Two-Stream network was used to extract feature maps, the vector of locally aggregated descriptors (VLAD) is applied to get the video representation so as to achieve behavior recognition.  ... 
doi:10.1109/ivcnz51579.2020.9290640 fatcat:sq4fni6z2nfz5okecnsbmzum6e
« Previous Showing results 1 — 15 out of 23 results