Filters








34,025 Hits in 4.3 sec

Attention Transfer from Web Images for Video Recognition

Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli
2017 Proceedings of the 2017 ACM on Multimedia Conference - MM '17  
In this work, we propose a novel approach to transfer knowledge from image domain to video domain.  ...  Training deep learning based video classifiers for action recognition requires a large amount of labeled videos. The labeling process is labor-intensive and time-consuming.  ...  In this work, we explore the use of attention for cross-domain knowledge transfer from Web images to videos.  ... 
doi:10.1145/3123266.3123432 dblp:conf/mm/LiWZK17 fatcat:kb7o55xp5fhgpdmzzxvdoryp7u

Attention Transfer from Web Images for Video Recognition [article]

Junnan Li, Yongkang Wong, Qi Zhao, Mohan Kankanhalli
2017 arXiv   pre-print
In this work, we propose a novel approach to transfer knowledge from image domain to video domain.  ...  Training deep learning based video classifiers for action recognition requires a large amount of labeled videos. The labeling process is labor-intensive and time-consuming.  ...  In this work, we explore the use of attention for cross-domain knowledge transfer from Web images to videos.  ... 
arXiv:1708.00973v1 fatcat:c3qvuqxzr5a6vcz7wub5yljqii

CycDA: Unsupervised Cycle Domain Adaptation from Image to Video [article]

Wei Lin, Anna Kukleva, Kunyang Sun, Horst Possegger, Hilde Kuehne, Horst Bischof
2022 arXiv   pre-print
Therefore, image-to-video adaptation has been proposed to exploit labeling-free web image source for adapting on unlabeled target videos.  ...  This poses two major challenges: (1) spatial domain shift between web images and video frames; (2) modality gap between image and video data.  ...  Li et al . [22] use a spatial attention map for cross-domain knowledge transfer from web images to videos.  ... 
arXiv:2203.16244v2 fatcat:qoebr44epzafbfyi3zuv6us4x4

Expanding Language-Image Pretrained Models for General Video Recognition [article]

Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling
2022 arXiv   pre-print
Contrastive language-image pretraining has shown great success in learning visual-textual joint representation from web-scale data, demonstrating remarkable "zero-shot" generalization ability for various  ...  In this work, we present a simple yet effective approach that adapts the pretrained language-image models to video recognition directly, instead of pretraining a new model from scratch.  ...  However, the transfer and adaptation to video recognition is not well explored.  ... 
arXiv:2208.02816v1 fatcat:g56f2d3vb5hr3dgwlzfibg6hjq

Guest editorial: web multimedia semantic inference using multi-cues

Yahong Han, Yi Yang, Xiaofang Zhou
2015 World wide web (Bussum)  
copy detection and video recognition.  ...  This special issue has gained overwhelming attention and received 23 submissions from researchers and practitioners working on Web multimedia semantic analysis.  ... 
doi:10.1007/s11280-015-0360-2 fatcat:vc4plge5qvg7hfmza3dffmawki

Localizing web videos using social images

Liujuan Cao, Xian-Ming Liu, Wei Liu, Rongrong Ji, Thomas Huang
2015 Information Sciences  
In this paper, we address the problem of localizing web videos through transferring large-scale web images with geographic tags to web videos, where near-duplicate detection between images and video frames  ...  A group of experiments are carried out on two datasets which collect Flickr images and YouTube videos crawled from the Web.  ...  Nowadays, there is an increasing amount of geo-tagged images available on the Web. Such massive image data prompts us to ''transfer'' the geo-tags from web images to web videos.  ... 
doi:10.1016/j.ins.2014.08.017 fatcat:znad7of3gjaqpnw26kowtltxni

ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning for Action Recognition [article]

Junting Pan, Ziyi Lin, Xiatian Zhu, Jing Shao, Hongsheng Li
2022 arXiv   pre-print
In this work, we investigate such a novel cross-modality transfer learning setting, namely parameter-efficient image-to-video transfer learning.  ...  However, existing attempts typically focus on downstream tasks from the same modality (e.g., image understanding) of the pre-trained model.  ...  For example, the CLIP model [57] , trained with 400 million web image-text pairs, achieves promising performances on a variety of image recognition and generation tasks.  ... 
arXiv:2206.13559v2 fatcat:6vl7zv2ezfhzdmfa4yw6x7i6ra

Visual experience recognition using adaptive support vector machine

Santhoshkumar SP, Kumar M Praveen, Beaulah H Lilly
2021 Trends in Computer Science and Information Technology  
A visual event recognition framework for consumer videos is framed by leveraging a large amount of loosely labeled web videos. The videos are divided into training and testing sets manually.  ...  between videos from two domains web video domain and consumer video domain.With the help of MATLAB Simulink videos are divided and compared with web domain videos.  ...  Therefore, the feature distributions of samples from the two domains web video domain and consumer video domain may Citation: Santhoshkumar SP, Kumar MP, Beaulah HL (2021) Visual experience recognition  ... 
doi:10.17352/tcsit.000043 fatcat:jpynwp6tbvclhbvwvi6yhgrgtu

Extreme Low Resolution Activity Recognition with Confident Spatial-Temporal Attention Transfer [article]

Yucai Bai, Qin Zou, Xieyuanli Chen, Lingxi Li, Zhengming Ding, Long Chen
2021 arXiv   pre-print
In this work, we propose a novel Confident Spatial-Temporal Attention Transfer (CSTAT) for eLR activity recognition.  ...  CSTAT can acquire information from HR data by reducing the attention differences with a transfer-learning strategy.  ...  Benefit from image sequential information, action recognition is still feasible even with eLR videos.  ... 
arXiv:1909.03580v4 fatcat:22w3cx46ynftfdrht3x6was3ge

Learning from Weakly-Labeled Web Videos via Exploring Sub-concepts

Kunpeng Li, Zizhao Zhang, Guanhang Wu, Xuehan Xiong, Chen-Yu Lee, Zhichao Lu, Yun Fu, Tomas Pfister
2022 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
To address this challenge, we introduce a new method for pre-training video action recognition models using queried web videos.  ...  However, for video action recognition, the action of interest might only exist in arbitrary clips of untrimmed web videos, resulting in high label noises in the temporal space.  ...  For video classification, early works (Gan et al. 2016a ) focus on utilizing web action images to boost action recognition models.  ... 
doi:10.1609/aaai.v36i2.20022 fatcat:m43wkp7a2fdgde6xbwpuo43vom

CoCa: Contrastive Captioners are Image-Text Foundation Models [article]

Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, Yonghui Wu
2022 arXiv   pre-print
CoCa is pretrained end-to-end and from scratch on both web-scale alt-text data and annotated images by treating all labels simply as text, seamlessly unifying natural language supervision for representation  ...  , and cascades the remaining decoder layers which cross-attend to the image encoder for multimodal image-text representations.  ...  helpful discussions, Andrew Dai for help with contrastive models, Christopher Fifty and Bowen Zhang for help with video models, Yuanzhong Xu for help with model scaling, Lucas Beyer for help with data  ... 
arXiv:2205.01917v2 fatcat:o3mtqyvxe5gy5gyb5t7kmrnzsa

Transferring Cross-Domain Knowledge for Video Sign Language Recognition

Dongxu Li, Xin Yu, Chenchen Xu, Lars Petersson, Hongdong Li
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Since these videos have no word-level annotation and exhibit a large domain gap from isolated signs, they cannot be directly used for training WSLR models.  ...  Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation. It requires models to recognize isolated sign words from videos.  ...  HL's research is funded in part by the ARC Centre of Excellence for Robotics Vision (CE140100016), ARC-Discovery (DP 190102261) and ARC-LIEF (190100080) grants, as well as a research grant from Baidu on  ... 
doi:10.1109/cvpr42600.2020.00624 dblp:conf/cvpr/LiYXPL20 fatcat:sndlr6kwwbayzkn6yzmfj2ypsu

Weakly Supervised Semantic Segmentation Using Web-Crawled Videos

Seunghoon Hong, Donghun Yeo, Suha Kwak, Honglak Lee, Bohyung Han
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
Our goal is to overcome this limitation with no additional human intervention by retrieving videos relevant to target class labels from web repository, and generating segmentation labels from the retrieved  ...  We propose a novel algorithm for weakly supervised semantic segmentation based on image-level class labels only.  ...  For reliable object segmentation in video, our framework first learns an encoder from weakly annotated images to predict attention map, and incorporates the attention with motion cues in videos to capture  ... 
doi:10.1109/cvpr.2017.239 dblp:conf/cvpr/HongYKLH17 fatcat:723ug43aknhw7gpm36spes4ose

Cloud-Based Facial Expression Recognition System for Customer Satisfaction in Distribution Sectors

Jiyoon Lee, Wonil Hwang
2020 Innovative Computing Information and Control Express Letters, Part B: Applications  
In conclusion, this study showed that we could successfully apply the method of facial expression recognition for evaluating customer's satisfaction, and the proposed system architecture worked well in  ...  external data for showing how the proposed system worked in real-world situations.  ...  This research was supported by the Ministry of Trade, Industry and Energy (MOTIE), Korea, through the Education Program for Creative and Industrial Convergence (Grant Number N0000717).  ... 
doi:10.24507/icicelb.11.02.173 fatcat:lj6aibkfc5dnjh2ios3tb23xyu

A Natural and Immersive Virtual Interface for the Surgical Safety Checklist Training

Andrea Ferracani, Daniele Pezzatini, Alberto Del Bimbo
2014 Proceedings of the 2014 ACM International Workshop on Serious Games - SeriousGames '14  
By leveraging big data from billions of search queries, billions of images on the web and from the social networks, and billions of user clicks, we have designed massive machine learning systems to continuously  ...  Since the launch of Bing (www.bing.com) in June 2009, we have seen Bing web search market share in the US more than doubled and Bing image search query share quadrupled.  ...  Understanding User Journeys in Online TV Inductive Transfer Deep Hashing for Image Retrieval What Can We Learn about Motion Videos from Still Images?  ... 
doi:10.1145/2656719.2656725 dblp:conf/mm/FerracaniPB14a fatcat:obsb2i4iybhu3dq77hujvjtbze
« Previous Showing results 1 — 15 out of 34,025 results