A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Attention Transfer from Web Images for Video Recognition
2017
Proceedings of the 2017 ACM on Multimedia Conference - MM '17
In this work, we propose a novel approach to transfer knowledge from image domain to video domain. ...
Training deep learning based video classifiers for action recognition requires a large amount of labeled videos. The labeling process is labor-intensive and time-consuming. ...
In this work, we explore the use of attention for cross-domain knowledge transfer from Web images to videos. ...
doi:10.1145/3123266.3123432
dblp:conf/mm/LiWZK17
fatcat:kb7o55xp5fhgpdmzzxvdoryp7u
Attention Transfer from Web Images for Video Recognition
[article]
2017
arXiv
pre-print
In this work, we propose a novel approach to transfer knowledge from image domain to video domain. ...
Training deep learning based video classifiers for action recognition requires a large amount of labeled videos. The labeling process is labor-intensive and time-consuming. ...
In this work, we explore the use of attention for cross-domain knowledge transfer from Web images to videos. ...
arXiv:1708.00973v1
fatcat:c3qvuqxzr5a6vcz7wub5yljqii
CycDA: Unsupervised Cycle Domain Adaptation from Image to Video
[article]
2022
arXiv
pre-print
Therefore, image-to-video adaptation has been proposed to exploit labeling-free web image source for adapting on unlabeled target videos. ...
This poses two major challenges: (1) spatial domain shift between web images and video frames; (2) modality gap between image and video data. ...
Li et al . [22] use a spatial attention map for cross-domain knowledge transfer from web images to videos. ...
arXiv:2203.16244v2
fatcat:qoebr44epzafbfyi3zuv6us4x4
Expanding Language-Image Pretrained Models for General Video Recognition
[article]
2022
arXiv
pre-print
Contrastive language-image pretraining has shown great success in learning visual-textual joint representation from web-scale data, demonstrating remarkable "zero-shot" generalization ability for various ...
In this work, we present a simple yet effective approach that adapts the pretrained language-image models to video recognition directly, instead of pretraining a new model from scratch. ...
However, the transfer and adaptation to video recognition is not well explored. ...
arXiv:2208.02816v1
fatcat:g56f2d3vb5hr3dgwlzfibg6hjq
Guest editorial: web multimedia semantic inference using multi-cues
2015
World wide web (Bussum)
copy detection and video recognition. ...
This special issue has gained overwhelming attention and received 23 submissions from researchers and practitioners working on Web multimedia semantic analysis. ...
doi:10.1007/s11280-015-0360-2
fatcat:vc4plge5qvg7hfmza3dffmawki
Localizing web videos using social images
2015
Information Sciences
In this paper, we address the problem of localizing web videos through transferring large-scale web images with geographic tags to web videos, where near-duplicate detection between images and video frames ...
A group of experiments are carried out on two datasets which collect Flickr images and YouTube videos crawled from the Web. ...
Nowadays, there is an increasing amount of geo-tagged images available on the Web. Such massive image data prompts us to ''transfer'' the geo-tags from web images to web videos. ...
doi:10.1016/j.ins.2014.08.017
fatcat:znad7of3gjaqpnw26kowtltxni
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning for Action Recognition
[article]
2022
arXiv
pre-print
In this work, we investigate such a novel cross-modality transfer learning setting, namely parameter-efficient image-to-video transfer learning. ...
However, existing attempts typically focus on downstream tasks from the same modality (e.g., image understanding) of the pre-trained model. ...
For example, the CLIP model [57] , trained with 400 million web image-text pairs, achieves promising performances on a variety of image recognition and generation tasks. ...
arXiv:2206.13559v2
fatcat:6vl7zv2ezfhzdmfa4yw6x7i6ra
Visual experience recognition using adaptive support vector machine
2021
Trends in Computer Science and Information Technology
A visual event recognition framework for consumer videos is framed by leveraging a large amount of loosely labeled web videos. The videos are divided into training and testing sets manually. ...
between videos from two domains web video domain and consumer video domain.With the help of MATLAB Simulink videos are divided and compared with web domain videos. ...
Therefore, the feature distributions of samples from the two domains web video domain and consumer video domain may Citation: Santhoshkumar SP, Kumar MP, Beaulah HL (2021) Visual experience recognition ...
doi:10.17352/tcsit.000043
fatcat:jpynwp6tbvclhbvwvi6yhgrgtu
Extreme Low Resolution Activity Recognition with Confident Spatial-Temporal Attention Transfer
[article]
2021
arXiv
pre-print
In this work, we propose a novel Confident Spatial-Temporal Attention Transfer (CSTAT) for eLR activity recognition. ...
CSTAT can acquire information from HR data by reducing the attention differences with a transfer-learning strategy. ...
Benefit from image sequential information, action recognition is still feasible even with eLR videos. ...
arXiv:1909.03580v4
fatcat:22w3cx46ynftfdrht3x6was3ge
Learning from Weakly-Labeled Web Videos via Exploring Sub-concepts
2022
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
To address this challenge, we introduce a new method for pre-training video action recognition models using queried web videos. ...
However, for video action recognition, the action of interest might only exist in arbitrary clips of untrimmed web videos, resulting in high label noises in the temporal space. ...
For video classification, early works (Gan et al. 2016a ) focus on utilizing web action images to boost action recognition models. ...
doi:10.1609/aaai.v36i2.20022
fatcat:m43wkp7a2fdgde6xbwpuo43vom
CoCa: Contrastive Captioners are Image-Text Foundation Models
[article]
2022
arXiv
pre-print
CoCa is pretrained end-to-end and from scratch on both web-scale alt-text data and annotated images by treating all labels simply as text, seamlessly unifying natural language supervision for representation ...
, and cascades the remaining decoder layers which cross-attend to the image encoder for multimodal image-text representations. ...
helpful discussions, Andrew Dai for help with contrastive models, Christopher Fifty and Bowen Zhang for help with video models, Yuanzhong Xu for help with model scaling, Lucas Beyer for help with data ...
arXiv:2205.01917v2
fatcat:o3mtqyvxe5gy5gyb5t7kmrnzsa
Transferring Cross-Domain Knowledge for Video Sign Language Recognition
2020
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Since these videos have no word-level annotation and exhibit a large domain gap from isolated signs, they cannot be directly used for training WSLR models. ...
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation. It requires models to recognize isolated sign words from videos. ...
HL's research is funded in part by the ARC Centre of Excellence for Robotics Vision (CE140100016), ARC-Discovery (DP 190102261) and ARC-LIEF (190100080) grants, as well as a research grant from Baidu on ...
doi:10.1109/cvpr42600.2020.00624
dblp:conf/cvpr/LiYXPL20
fatcat:sndlr6kwwbayzkn6yzmfj2ypsu
Weakly Supervised Semantic Segmentation Using Web-Crawled Videos
2017
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Our goal is to overcome this limitation with no additional human intervention by retrieving videos relevant to target class labels from web repository, and generating segmentation labels from the retrieved ...
We propose a novel algorithm for weakly supervised semantic segmentation based on image-level class labels only. ...
For reliable object segmentation in video, our framework first learns an encoder from weakly annotated images to predict attention map, and incorporates the attention with motion cues in videos to capture ...
doi:10.1109/cvpr.2017.239
dblp:conf/cvpr/HongYKLH17
fatcat:723ug43aknhw7gpm36spes4ose
Cloud-Based Facial Expression Recognition System for Customer Satisfaction in Distribution Sectors
2020
Innovative Computing Information and Control Express Letters, Part B: Applications
In conclusion, this study showed that we could successfully apply the method of facial expression recognition for evaluating customer's satisfaction, and the proposed system architecture worked well in ...
external data for showing how the proposed system worked in real-world situations. ...
This research was supported by the Ministry of Trade, Industry and Energy (MOTIE), Korea, through the Education Program for Creative and Industrial Convergence (Grant Number N0000717). ...
doi:10.24507/icicelb.11.02.173
fatcat:lj6aibkfc5dnjh2ios3tb23xyu
A Natural and Immersive Virtual Interface for the Surgical Safety Checklist Training
2014
Proceedings of the 2014 ACM International Workshop on Serious Games - SeriousGames '14
By leveraging big data from billions of search queries, billions of images on the web and from the social networks, and billions of user clicks, we have designed massive machine learning systems to continuously ...
Since the launch of Bing (www.bing.com) in June 2009, we have seen Bing web search market share in the US more than doubled and Bing image search query share quadrupled. ...
Understanding User Journeys in Online TV
Inductive Transfer Deep Hashing for Image Retrieval
What Can We Learn about Motion Videos from Still Images? ...
doi:10.1145/2656719.2656725
dblp:conf/mm/FerracaniPB14a
fatcat:obsb2i4iybhu3dq77hujvjtbze
« Previous
Showing results 1 — 15 out of 34,025 results