Filters








3,128 Hits in 9.3 sec

Few-shot Action Recognition with Implicit Temporal Alignment and Pair Similarity Optimization [article]

Congqi Cao, Yajuan Li, Qinyi Lv, Peng Wang, Yanning Zhang
2020 arXiv   pre-print
similarity comparison; 3) an advanced loss for few-shot learning to optimize pair similarity with limited data.  ...  To solve these problems, this paper presents 1) a specific setting to evaluate the performance of few-shot action recognition algorithms; 2) an implicit sequence-alignment algorithm for better video-level  ...  Thus, in this paper, we propose to align video sequences implicitly and optimize video-level pair similarity directly for few-shot action recognition.  ... 
arXiv:2010.06215v1 fatcat:hhchjwegsjfkxnpv4xoykxy5vu

Learning Implicit Temporal Alignment for Few-shot Video Classification [article]

Songyang Zhang, Jiale Zhou, Xuming He
2021 arXiv   pre-print
Our main idea is to introduce an implicit temporal alignment for a video pair, capable of estimating the similarity between them in an accurate and robust manner.  ...  Few-shot video classification aims to learn new video categories with only a few labeled examples, alleviating the burden of costly annotation in real-world applications.  ...  Few-shot Video Action Classification Few-shot video classification aims to classify new video classes with only a few annotated examples, and attracted much attention in community recently.  ... 
arXiv:2105.04823v1 fatcat:tthebmigibf7rirkt52uvna4ee

Few-Shot Action Recognition with Compromised Metric via Optimal Transport [article]

Su Lu, Han-Jia Ye, De-Chuan Zhan
2021 arXiv   pre-print
Although vital to computer vision systems, few-shot action recognition is still not mature despite the wide research of few-shot image classification.  ...  To preserve the inherent temporal ordering information, we additionally amend the ground cost matrix by penalizing it with the positional distance between a pair of segments.  ...  Actually, both semantic and temporal information should be considered in few-shot action recognition.  ... 
arXiv:2104.03737v1 fatcat:7g7fmzbktnhxpe3x2gj2crnuom

A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark [article]

Zhenxi Zhu, Limin Wang, Sheng Guo, Gangshan Wu
2021 arXiv   pre-print
The existing few-shot video classification methods often employ a meta-learning paradigm by designing customized temporal alignment module for similarity calculation.  ...  Second, we discover that there is a high correlation between the novel action class and the ImageNet object class, which is problematic in the few-shot recognition setting.  ...  Software Technology and Industrialization.  ... 
arXiv:2110.12358v1 fatcat:7fwo6wbv6bbrjjwzz4wf2ssb7y

Hybrid Relation Guided Set Matching for Few-shot Action Recognition [article]

Xiang Wang, Shiwei Zhang, Zhiwu Qing, Mingqian Tang, Zhengrong Zuo, Changxin Gao, Rong Jin, Nong Sang
2022 arXiv   pre-print
Current few-shot action recognition methods reach impressive performance by learning discriminative features for each video via episodic training and designing various temporal alignment strategies.  ...  By this means, the proposed HyRSM can be highly informative and flexible to predict query categories under the few-shot settings.  ...  Intelligent Control under Grant B18024, and Alibaba Group through Alibaba Research Intern Program.  ... 
arXiv:2204.13423v1 fatcat:dpgf2ntdmfgilpjtfoukwebsmi

Cross-Modal and Hierarchical Modeling of Video and Text [chapter]

Bowen Zhang, Hexiang Hu, Fei Sha
2018 Lecture Notes in Computer Science  
We show its utility in zero-shot action recognition and video captioning.  ...  A video can describe a complex scene that is composed of multiple clips or shots, where each depicts a semantically coherent event or action.  ...  Sloan Research Fellowship, gifts from Facebook and Netflix, and ARO# W911NF-12-1-0241 and W911NF-15-1-0484.  ... 
doi:10.1007/978-3-030-01261-8_23 fatcat:e3t7mxggobgtzcnfso42bvwop4

Intra- and Inter-Action Understanding via Temporal Action Parsing [article]

Dian Shao, Yue Zhao, Bo Dai, Dahua Lin
2020 arXiv   pre-print
Towards this goal, we construct TAPOS, a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top.  ...  Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition.  ...  Figure 4 : 4 Similar sub-actions are shared by irrelevant actions, e.g., jump in beam and triple jump (the first pair), somersault in uneven bars and diving (the second pair).  ... 
arXiv:2005.10229v1 fatcat:n4zcmoo7mfc2xhje3pm32tb5ka

Cross-Modal and Hierarchical Modeling of Video and Text [article]

Bowen Zhang, Hexiang Hu, Fei Sha
2018 arXiv   pre-print
We show its utility in zero-shot action recognition and video captioning.  ...  A video can describe a complex scene that is composed of multiple clips or shots, where each depicts a semantically coherent event or action.  ...  Sloan Research Fellowship, gifts from Facebook and Netflix, and ARO# W911NF-12-1-0241 and W911NF-15-1-0484.  ... 
arXiv:1810.07212v1 fatcat:nokqt3vrkbf37paknjzrvrwzci

2021 Index IEEE Transactions on Image Processing Vol. 30

2021 IEEE Transactions on Image Processing  
The primary entry includes the coauthors' names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, month, and inclusive pagination.  ...  The Subject Index contains entries describing the item under all appropriate subject headings, plus the first author's name, the publication abbreviation, month, and year, and inclusive pages.  ...  ., +, TIP 2021 9220-9230 Image recognition A Pairwise Attentive Adversarial Spatiotemporal Network for Cross-Domain Few-Shot Action Recognition-R2.  ... 
doi:10.1109/tip.2022.3142569 fatcat:z26yhwuecbgrnb2czhwjlf73qu

Multimodal Machine Learning: A Survey and Taxonomy [article]

Tadas Baltrušaitis, Chaitanya Ahuja, Louis-Philippe Morency
2017 arXiv   pre-print
It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential.  ...  We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and  ...  , action recognition, audio-visual speech recognition, and semantic similarity estimation.  ... 
arXiv:1705.09406v2 fatcat:262fo4sihffvxecg4nwsifoddm

Weakly Supervised Action Labeling in Videos under Ordering Constraints [chapter]

Piotr Bojanowski, Rémi Lajugie, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid, Josef Sivic
2014 Lecture Notes in Computer Science  
We formulate the problem as a weakly supervised temporal assignment with ordering constraints.  ...  We evaluate the proposed model on a new and challenging dataset of 937 video clips with a total of 787720 frames containing sequences of 16 different actions from 69 Hollywood movies.  ...  Introduction Significant progress towards action recognition in realistic video settings has been achieved in the past few years [22, 24, 26, 30, 35] .  ... 
doi:10.1007/978-3-319-10602-1_41 fatcat:4amyqqud6ncm3fdwnkwuksiho4

2020 Index IEEE Transactions on Image Processing Vol. 29

2020 IEEE Transactions on Image Processing  
., +, TIP 2020 2150-2165 FAMED-Net: A Fast and Accurate Multi-Scale End-to-End Dehazing Net- work. Zhang, J., +, TIP 2020 72-84 Few-Shot Text Style Transfer via Deep Feature Similarity.  ...  ., +, TIP 2020 8028-8042 A Two-Stage Approach to Few-Shot Learning for Image Recognition. An Unordered Image Stitching Method Based on Binary Tree and Estimated Overlapping Area.  ... 
doi:10.1109/tip.2020.3046056 fatcat:24m6k2elprf2nfmucbjzhvzk3m

Emotion Recognition from Multiple Modalities: Fundamentals and Methodologies [article]

Sicheng Zhao, Guoli Jia, Jufeng Yang, Guiguang Ding, Kurt Keutzer
2021 arXiv   pre-print
We begin with a brief introduction on widely used emotion representation models and affective modalities.  ...  ., recognizing, interpreting, processing, and simulating emotions, is becoming increasingly important. In this tutorial, we discuss several key aspects of multi-modal emotion recognition (MER).  ...  As such, designing effective algorithms for unsupervised/weaklysupervised learning and few/zero shot learning can provide potential solutions.  ... 
arXiv:2108.10152v1 fatcat:hwnq7hoiqba3pdf6aakcxjq33i

Weakly Supervised Action Labeling in Videos Under Ordering Constraints [article]

Piotr Bojanowski, Rémi Lajugie, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid, Josef Sivic
2014 arXiv   pre-print
We formulate the problem as a weakly supervised temporal assignment with ordering constraints.  ...  We evaluate the proposed model on a new and challenging dataset of 937 video clips with a total of 787720 frames containing sequences of 16 different actions from 69 Hollywood movies.  ...  ERC grants ALLEGRO, VideoWorld, Activia and Sierra.  ... 
arXiv:1407.1208v1 fatcat:gcebvzflhbdytfj37txqjy42bi

A Survey of Content-Based Video Retrieval

P. Geetha, Vasumathi Narayanan
2008 Journal of Computer Science  
The major themes covered by the study include shot segmentation, key frame extraction, feature extraction, clustering, indexing and video retrieval-by similarity, probabilistic, transformational, refinement  ...  and relevance feedback.  ...  [69] employed dynamic programming to align two video sequences of different temporal length.  ... 
doi:10.3844/jcssp.2008.474.486 fatcat:ntongqzelvdbtanlz4tuerzreq
« Previous Showing results 1 — 15 out of 3,128 results