9,058 Hits in 5.5 sec

Harnessing Object and Scene Semantics for Large-Scale Video Understanding

Zuxuan Wu, Yanwei Fu, Yu-Gang Jiang, Leonid Sigal
2016 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
We illustrate effectiveness of this semantic representation through experiments on zero-shot action/video classification and clustering.  ...  These video class-object/video class-scene relationships can in turn be used as semantic representation for the video classes themselves.  ...  Finally, we show how discovered OSR can be utilized for zero-shot video classification (Sec. 3.3).  ... 
doi:10.1109/cvpr.2016.339 dblp:conf/cvpr/WuFJS16 fatcat:6zpkaj64c5hx7g6hsttamlj5yq

A fusion scheme of visual and auditory modalities for event detection in sports video

Min Xu, Ling-Yu Duan, Chang-Sheng Xu, Qi Tian
2003 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)  
Since we have developed a unified framework for semantic shot classification in sports videos and a set of audio mid-level representation with supervised learning methods, the proposed fusion scheme can  ...  The proposed scheme is built upon semantic shot classification, where we classify video shots into several major or interesting classes, each of which has clear semantic meanings.  ...  According to the domain model, we try to learn the rules for shot classes identification.  ... 
doi:10.1109/icme.2003.1220922 dblp:conf/icmcs/XuDXT03 fatcat:6gxfz64ayzb7nfg65445zjtwbm

Recent Advances in Zero-shot Recognition [article]

Yanwei Fu, Tao Xiang, Yu-Gang Jiang, Xiangyang Xue, Leonid Sigal, and Shaogang Gong
2017 arXiv   pre-print
One approach to scaling up the recognition is to develop models capable of recognizing unseen categories without any training instances, or zero-shot recognition/ learning.  ...  However, to scale the recognition to a large number of classes with few or now training samples for each class remains an unsolved problem.  ...  Yanwei Fu is supported by The Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning.  ... 
arXiv:1710.04837v1 fatcat:u3mp6dgj2rgqrarjm4dcywegmy

Generalized Zero-Shot Learning for Action Recognition with Web-Scale Video Data [article]

Kun Liu, Wu Liu, Huadong Ma, Wenbing Huang, Xiongxiong Dong
2017 arXiv   pre-print
Zero-shot learning is potential to be applied to solve these issues since it can perform classification without positive example.  ...  Then, we propose a method for action recognition by deploying generalized zero-shot learning, which transfers the knowledge of web video to detect the anomalous actions in surveillance videos.  ...  Hence it is hard to model the relationship between seen and unseen ones.  ... 
arXiv:1710.07455v1 fatcat:datwl63c5jd2hiylkz7636lra4

Zero-Shot Activity Recognition with Videos [article]

Evin Pinar Ornek
2020 arXiv   pre-print
In this paper, we examined the zero-shot activity recognition task with the usage of videos.  ...  The zero-shot recognition results are evaluated by top-n accuracy. Then, the manifold learning ability is measured by mean Nearest Neighbor Overlap.  ...  The existing zero-shot object classification problem is shown to have higher accuracy with compatibility learning models that learn the mapping between the distributions rather than the attribute classifiers  ... 
arXiv:2002.02265v1 fatcat:umbgctxyzzbvhcfrkg7dgfnciq

All About Knowledge Graphs for Actions [article]

Pallabi Ghosh, Nirat Saini, Larry S. Davis, Abhinav Shrivastava
2020 arXiv   pre-print
Recent works have explored the paradigm of zero-shot and few-shot learning to learn classifiers for unseen categories or categories with few labels.  ...  Finally, to enable a systematic study of zero-shot and few-shot approaches, we propose an improved evaluation paradigm based on UCF101, HMDB51, and Charades datasets for knowledge transfer from models  ...  Both zero-shot and few-shot learning methods have been studied widely for image classification.  ... 
arXiv:2008.12432v1 fatcat:3mnx3orvhreg3d43a2pmr7b5wq

I Know the Relationships: Zero-Shot Action Recognition via Two-Stream Graph Convolutional Networks and Knowledge Graphs

Junyu Gao, Tianzhu Zhang, Changsheng Xu
Recently, with the ever-growing action categories, zero-shot action recognition (ZSAR) has been achieved by automatically mining the underlying concepts (e.g., actions, attributes) in videos.  ...  In addition, a self-attention module is utilized to model the temporal information of videos.  ...  The proposed two-stream framework is suitable for zero-shot learning problem with attributes. We utilize GCNs to transfer information between different concepts.  ... 
doi:10.1609/aaai.v33i01.33018303 fatcat:sg5ock7ipzcqvgmnnxtup7qxlq

Tell me what you see: A zero-shot action recognition method based on natural language descriptions [article]

Valter Estevam and Rayson Laroca and David Menotti and Helio Pedrini
2021 arXiv   pre-print
Recently, several approaches have explored the detection and classification of objects in videos to perform Zero-Shot Action Recognition with remarkable results.  ...  Inspired by these methods and by video captioning's ability to describe events not only with a set of objects but with contextual information, we propose a method in which video captioning models, called  ...  Rethinking zero-shot video classification: End-to-end training for realistic applications, in: IEEE Conference on Com- 5.  ... 
arXiv:2112.09976v1 fatcat:5bvci2dyjnbyjbvo7qagnuzbpy

Disentangled Action Recognition with Knowledge Bases [article]

Zhekun Luo, Shalini Ghosh, Devin Guillory, Keizo Kato, Trevor Darrell, Huijuan Xu
2022 arXiv   pre-print
DARK trains a factorized model by first extracting disentangled feature representations for verbs and nouns, and then predicting classification weights using relations in external knowledge graphs.  ...  Action in video usually involves the interaction of human with objects.  ...  Ablations on different components Zero-shot learning in verb/noun classifier: In DARK model, we do separate verb and noun classification in two branches.  ... 
arXiv:2207.01708v1 fatcat:qywim6ipxvasvlwqqu7atfugny

GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot Action Recognition [article]

Bin Sun, Dehui Kong, Shaofan Wang, Jinghua Li, Baocai Yin, Xiaonan Luo
2021 arXiv   pre-print
on attention mechanism, which dynamically updates the relationship between action classes and objects, and enhances the generalization ability of zero-shot learning.  ...  Furthermore, the learned classifier incline to predict the samples of seen class, which leads to poor classification performance.  ...  The aim of zero-shot learning (ZSL) is to explore common latent semantic representation, and produce a trained model that can generalize to unseen classes.  ... 
arXiv:2105.11789v1 fatcat:brik2cncqbd3pm2mrhozkf5fla

Skeleton based Zero Shot Action Recognition in Joint Pose-Language Semantic Space [article]

Bhavan Jasani, Afshaan Mazagonwalla
2019 arXiv   pre-print
Such questions are addressed by the Zero Shot Learning paradigm, where a model is trained on only a subset of classes and is evaluated on its ability to correctly classify an example from a class it has  ...  Our model learns to jointly encapsulate visual similarities based on pose features of the action performer as well as similarities in the natural language descriptions of the unseen action class names.  ...  Zero shot learning -Relation Networks Learning to Compare: Relation Network for Few-Shot Learning [15] overcomes some of the limitations of De-ViSE [5] .  ... 
arXiv:1911.11344v1 fatcat:5adeuam35vd5fclts2rzis3wby

Objects2action: Classifying and localizing actions without any video example [article]

Mihir Jain, Jan C. van Gemert, Thomas Mensink, Cees G. M. Snoek
2015 arXiv   pre-print
Different from traditional zero-shot approaches we do not demand the design and specification of attribute classifiers and class-to-attribute mappings to allow for transfer from seen classes to unseen  ...  And finally, we demonstrate how to extend our zero-shot approach to the spatio-temporal localization of actions in video. Experiments on four action datasets demonstrate the potential of our approach.  ...  be beneficial for zero-shot classification.  ... 
arXiv:1510.06939v1 fatcat:gk5yn3ywpfdoxdxmbv5rjsghmi

Objects2action: Classifying and Localizing Actions without Any Video Example

Mihir Jain, Jan C. van Gemert, Thomas Mensink, Cees G. M. Snoek
2015 2015 IEEE International Conference on Computer Vision (ICCV)  
Different from traditional zero-shot approaches we do not demand the design and specification of attribute classifiers and class-to-attribute mappings to allow for transfer from seen classes to unseen  ...  And finally, we demonstrate how to extend our zero-shot approach to the spatio-temporal localization of actions in video. Experiments on four action datasets demonstrate the potential of our approach.  ...  be beneficial for zero-shot classification.  ... 
doi:10.1109/iccv.2015.521 dblp:conf/iccv/JainGMS15 fatcat:gi2c26ooijcnfnsn2ki6dspcxi

Joint Learning of Object and Action Detectors

Vicky Kalogeiton, Philippe Weinzaepfel, Vittorio Ferrari, Cordelia Schmid
2017 2017 IEEE International Conference on Computer Vision (ICCV)  
Moreover, the proposed architecture can be used for zero-shot learning of actions: our multitask objective leverages the commonalities of an action performed by different objects, e.g. dog and cat jumping  ...  We introduce an end-to-end multitask objective that jointly learns object-action relationships.  ...  We gratefully acknowledge the support of NVIDIA with the donation of GPUs used for this research.  ... 
doi:10.1109/iccv.2017.219 dblp:conf/iccv/KalogeitonWFS17 fatcat:zhjxvt6unnfedipyg4b75yjfla

Prompting Visual-Language Models for Efficient Video Understanding [article]

Chen Ju, Tengda Han, Kunhao Zheng, Ya Zhang, Weidi Xie
2022 arXiv   pre-print
Image-based visual-language (I-VL) pre-training has shown great success for learning joint visual-textual representations from large-scale web data, revealing remarkable ability for zero-shot generalisation  ...  This paper presents a simple but strong baseline to efficiently adapt the pre-trained I-VL model, and exploit its powerful ability for resource-hungry video understanding tasks, with minimal training.  ...  As a result, these pre-trained I-VL models have demonstrated remarkable "zero-shot" generalisation for various image classification tasks.  ... 
arXiv:2112.04478v2 fatcat:3dzenc4jkbct5jlrfg255qutty
« Previous Showing results 1 — 15 out of 9,058 results