Filters








126,890 Hits in 7.2 sec

YouTubeCat: Learning to categorize wild web videos

Zheshen Wang, Ming Zhao, Yang Song, Sanjiv Kumar, Baoxin Li
2010 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition  
We propose to achieve this by first manually creating a small labeled set and then extending it using additional sources such as related videos, searched videos, and text-based webpages.  ...  Then, using the hierarchical taxonomy of the categories, a Conditional Random Field (CRF) based fusion strategy is designed.  ...  Results and analysis The objective of the proposed approach is to improve video classification performance by making use of data from multiple sources of varied quality.  ... 
doi:10.1109/cvpr.2010.5540125 dblp:conf/cvpr/WangZSKL10 fatcat:e6ieenc53nhcliadzps6z4p3eu

Improving video classification via youtube video co-watch data

John R. Zhang, Yang Song, Thomas Leung
2011 Proceedings of the 2011 ACM workshop on Social and behavioural networked media access - SBNMA '11  
In this paper, we explore an approach which exploits YouTube video co-watch data to improve the performance of a video taxonomic classification system.  ...  Evaluation is performed by comparing against classifiers trained using manually labeled web documents and videos.  ...  CONCLUSION We have presented a method to use YouTube video cowatch data to generate training data and to improve the performance of a video taxonomic classification system.  ... 
doi:10.1145/2072627.2072635 fatcat:352xh46awvgcdafqpfwyhqrngi

Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text [article]

Zhe Wang, Kingsley Kuan, Mathieu Ravaut, Gaurav Manek, Sibo Song, Yuan Fang, Seokhwan Kim, Nancy Chen, Luis Fernando D'Haro, Luu Anh Tuan, Hongyuan Zhu, Zeng Zeng (+4 others)
2017 arXiv   pre-print
The YouTube-8M video classification challenge requires teams to classify 0.7 million videos into one or more of 4,716 classes.  ...  In this Kaggle competition, we placed in the top 3% out of 650 participants using released video and audio features.  ...  Acknowledgments 1 .Figure 1 . 11 The presence of an additional mode of data -text -can greatly improve classification performance by providing semantic information regarding the video.  ... 
arXiv:1706.05461v3 fatcat:utlwwc72qneb5hihwxytvig6fy

Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation [article]

Yongqin Xian, Bruno Korbar, Matthijs Douze, Lorenzo Torresani, Bernt Schiele, Zeynep Akata
2021 arXiv   pre-print
To circumvent the need of labeled examples, we present two novel approaches that yield further improvement.  ...  Second, we learn generative adversarial networks that generate video features of novel classes from their semantic embeddings.  ...  Although none of these videos are annotated with a class label, half of them (400k) have at least one user tag. We use the tag-labeled videos of YFCC100M to improve the few-shot video classification.  ... 
arXiv:2007.04755v2 fatcat:gt4lhkxwefa2bfppqdpdppysle

Self-Learning for Player Localization in Sports Video [article]

Kenji Okuma and David G. Lowe and James J. Little
2013 arXiv   pre-print
In our experiments, our approach exploits both labelled and unlabelled data in sparsely labelled videos of sports games, providing a mean performance improvement of over 20% in the average precision for  ...  Unlike most previous self-learning approaches for improving appearance-based object detectors from videos, we allow an unknown, unconstrained number of target objects in a more generalized video sequence  ...  These tracklets are used as a pool of candidate data C from which we collect a set of training labels for improving performance of classification models.  ... 
arXiv:1307.7198v1 fatcat:yyryezpb6relveca4xdfjkfhqq

Semi-supervised Learning of Facial Attributes in Video [chapter]

Neva Cherniavsky, Ivan Laptev, Josef Sivic, Andrew Zisserman
2012 Lecture Notes in Computer Science  
First, we show that training on video data improves classification performance over training on images alone.  ...  Given a small set of images labeled with attributes and a much larger unlabeled set of video tracks, we train a classifier to recognize these attributes in video data. We make two contributions.  ...  We are grateful for financial support from the Royal Academy of Engineering, Microsoft, ERC grant VisRec no. 228180, the MSR-INRIA laboratory and the Quaero Programme, funded by OSEO.  ... 
doi:10.1007/978-3-642-35749-7_4 fatcat:cf23kzsyirgtpot33nnwsaq5y4

Taxonomic classification for web-based videos

Yang Song, Ming Zhao, Jay Yagnik, Xiaoyun Wu
2010 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition  
Evaluation on videos from hundreds of categories shows that the proposed algorithms generate significant performance improvement over text classifiers or classifiers trained using only video content based  ...  Categorizing web-based videos is an important yet challenging task. The difficulties arise from large data diversity within a category, lack of labeled data, and degradation of video quality.  ...  Integration of content-based features We want to utilize video content-based features to improve classification performance.  ... 
doi:10.1109/cvpr.2010.5540124 dblp:conf/cvpr/SongZYW10 fatcat:5e7v3b2yj5c3dpltltvejovnmi

LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning [article]

Ernest Cheung, Tsan Kwong Wong, Aniket Bera, Xiaogang Wang, Dinesh Manocha
2016 arXiv   pre-print
We present a novel procedural framework to generate an arbitrary number of labeled crowd videos (LCrowdV).  ...  We demonstrate the benefits of LCrowdV over prior lableled crowd datasets by improving the accuracy of pedestrian detection and crowd behavior classification algorithms.  ...  In this case, the use of LCrowdV labeled data can significant improve detectors' accuracy over prior datasets shown in Table 1 .  ... 
arXiv:1606.08998v2 fatcat:2xcyvku7y5asti3n4sjl7bk42i

Learning Representational Invariances for Data-Efficient Action Recognition [article]

Yuliang Zou, Jinwoo Choi, Qitong Wang, Jia-Bin Huang
2022 arXiv   pre-print
Data augmentation is a ubiquitous technique for improving image classification when labeled data is scarce.  ...  We also validate our data augmentation strategy in the fully supervised setting and demonstrate improved performance.  ...  labeled video data is used during training (labeled branch only in Figure 2 ).  ... 
arXiv:2103.16565v2 fatcat:2kz2f6yc3jb43gpdfg6ao7jo6a

A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark [article]

Zhenxi Zhu, Limin Wang, Sheng Guo, Gangshan Wu
2021 arXiv   pre-print
Our results show that the performance of training from scratch drops significantly, which implies that the existing benchmarks cannot provide enough base data.  ...  Finally, we present a new benchmark with more base data to facilitate future few-shot video classification without pre-training.  ...  This work is supported by the National Natural Science Foundation of China (No. 62076119), Program for Innovative Talents and Entrepreneur in Jiangsu Province, and Collaborative Innovation Center of Novel  ... 
arXiv:2110.12358v1 fatcat:7fwo6wbv6bbrjjwzz4wf2ssb7y

YouTubeEvent: On large-scale video event classification

Bingbing Ni, Yang Song, Ming Zhao
2011 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops)  
To improve classification performance, video content-based features are complemented with scores from a set of classifiers, which can be regarded as a type of high-level features.  ...  In this work, we investigate the problem of general event classification from uncontrolled YouTube videos.  ...  One major difficulty in general video event classification is the lack of labeled training data.  ... 
doi:10.1109/iccvw.2011.6130430 dblp:conf/iccvw/NiSZ11 fatcat:763dv7dbjffbdg5iydhofnyd6e

Boosting web video categorization with contextual information from social web

Xiao Wu, Chong-Wah Ngo, Yi-Ming Zhu, Qiang Peng
2011 World wide web (Bussum)  
The other improvement is derived from the integration of model-based classification and data-driven majority voting from related videos and user videos.  ...  Query expansion is adopted to reinforce the classification performance of text features through related videos and user videos.  ...  They can complement Generally, fusion of three components improves the performance. However, this is not always true.  ... 
doi:10.1007/s11280-011-0129-1 fatcat:b754uyfdkrezvmjtnjo2uextra

AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features from Multi-Modal Embeddings [article]

Pratik Mazumder, Pravendra Singh, Kranti Kumar Parida, Vinay P. Namboodiri
2020 arXiv   pre-print
The cross-modal decoder enforces a constraint that the class label text features can be reconstructed from the audio and video embeddings of data points.  ...  This helps the audio and video embeddings to move closer to the class label text embedding. The composite triplet loss makes use of the audio, video, and text embeddings.  ...  The authors in [21] propose CJME and experimentally show how adding audio information to video data can improve the model performance for zero-shot classification and retrieval.  ... 
arXiv:2005.13402v3 fatcat:me3aoglrbrdndpnofz2o4syrgu

A survey on video classification using action recognition

Caleb Andrew, Rex Fiona
2018 International Journal of Engineering & Technology  
The video classification problem can be said as a probabilistic data classification problem which falls as a subcategory of the machine learning technique.  ...  Classification helps in indexing, analyzing, searching etc. A survey has been made on the present technologies that are used for video classification.  ...  Both forces improve the performance of classification.  ... 
doi:10.14419/ijet.v7i2.31.13404 fatcat:p6ursb46fjd33dwpcwpjr524eu

Video Representation Learning and Latent Concept Mining for Large-scale Multi-label Video Classification [article]

Po-Yao Huang, Ye Yuan, Zhenzhong Lan, Lu Jiang, Alexander G. Hauptmann
2017 arXiv   pre-print
In this multi-label video classification task, our pipeline achieved 84.675% and 84.662% GAP on our evaluation split and the official test set.  ...  We report on CMU Informedia Lab's system used in Google's YouTube 8 Million Video Understanding Challenge.  ...  We also show that incorporating these latent concepts would improve the multi-label classification performance.  ... 
arXiv:1707.01408v3 fatcat:xyc4rtvrz5e7pjg5luxuux7ddi
« Previous Showing results 1 — 15 out of 126,890 results