A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Few-shot Action Recognition with Implicit Temporal Alignment and Pair Similarity Optimization
[article]
2020
arXiv
pre-print
similarity comparison; 3) an advanced loss for few-shot learning to optimize pair similarity with limited data. ...
To solve these problems, this paper presents 1) a specific setting to evaluate the performance of few-shot action recognition algorithms; 2) an implicit sequence-alignment algorithm for better video-level ...
Thus, in this paper, we propose to align video sequences implicitly and optimize video-level pair similarity directly for few-shot action recognition. ...
arXiv:2010.06215v1
fatcat:hhchjwegsjfkxnpv4xoykxy5vu
Learning Implicit Temporal Alignment for Few-shot Video Classification
[article]
2021
arXiv
pre-print
Our main idea is to introduce an implicit temporal alignment for a video pair, capable of estimating the similarity between them in an accurate and robust manner. ...
Few-shot video classification aims to learn new video categories with only a few labeled examples, alleviating the burden of costly annotation in real-world applications. ...
Few-shot Video Action Classification Few-shot video classification aims to classify new video classes with only a few annotated examples, and attracted much attention in community recently. ...
arXiv:2105.04823v1
fatcat:tthebmigibf7rirkt52uvna4ee
Few-Shot Action Recognition with Compromised Metric via Optimal Transport
[article]
2021
arXiv
pre-print
Although vital to computer vision systems, few-shot action recognition is still not mature despite the wide research of few-shot image classification. ...
To preserve the inherent temporal ordering information, we additionally amend the ground cost matrix by penalizing it with the positional distance between a pair of segments. ...
Actually, both semantic and temporal information should be considered in few-shot action recognition. ...
arXiv:2104.03737v1
fatcat:7g7fmzbktnhxpe3x2gj2crnuom
A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark
[article]
2021
arXiv
pre-print
The existing few-shot video classification methods often employ a meta-learning paradigm by designing customized temporal alignment module for similarity calculation. ...
Second, we discover that there is a high correlation between the novel action class and the ImageNet object class, which is problematic in the few-shot recognition setting. ...
Software Technology and Industrialization. ...
arXiv:2110.12358v1
fatcat:7fwo6wbv6bbrjjwzz4wf2ssb7y
Hybrid Relation Guided Set Matching for Few-shot Action Recognition
[article]
2022
arXiv
pre-print
Current few-shot action recognition methods reach impressive performance by learning discriminative features for each video via episodic training and designing various temporal alignment strategies. ...
By this means, the proposed HyRSM can be highly informative and flexible to predict query categories under the few-shot settings. ...
Intelligent Control under Grant B18024, and Alibaba Group through Alibaba Research Intern Program. ...
arXiv:2204.13423v1
fatcat:dpgf2ntdmfgilpjtfoukwebsmi
Cross-Modal and Hierarchical Modeling of Video and Text
[chapter]
2018
Lecture Notes in Computer Science
We show its utility in zero-shot action recognition and video captioning. ...
A video can describe a complex scene that is composed of multiple clips or shots, where each depicts a semantically coherent event or action. ...
Sloan Research Fellowship, gifts from Facebook and Netflix, and ARO# W911NF-12-1-0241 and W911NF-15-1-0484. ...
doi:10.1007/978-3-030-01261-8_23
fatcat:e3t7mxggobgtzcnfso42bvwop4
Intra- and Inter-Action Understanding via Temporal Action Parsing
[article]
2020
arXiv
pre-print
Towards this goal, we construct TAPOS, a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top. ...
Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition. ...
Figure 4 : 4 Similar sub-actions are shared by irrelevant actions, e.g., jump in beam and triple jump (the first pair), somersault in uneven bars and diving (the second pair). ...
arXiv:2005.10229v1
fatcat:n4zcmoo7mfc2xhje3pm32tb5ka
Cross-Modal and Hierarchical Modeling of Video and Text
[article]
2018
arXiv
pre-print
We show its utility in zero-shot action recognition and video captioning. ...
A video can describe a complex scene that is composed of multiple clips or shots, where each depicts a semantically coherent event or action. ...
Sloan Research Fellowship, gifts from Facebook and Netflix, and ARO# W911NF-12-1-0241 and W911NF-15-1-0484. ...
arXiv:1810.07212v1
fatcat:nokqt3vrkbf37paknjzrvrwzci
2021 Index IEEE Transactions on Image Processing Vol. 30
2021
IEEE Transactions on Image Processing
The primary entry includes the coauthors' names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, month, and inclusive pagination. ...
The Subject Index contains entries describing the item under all appropriate subject headings, plus the first author's name, the publication abbreviation, month, and year, and inclusive pages. ...
., +, TIP 2021 9220-9230 Image recognition A Pairwise Attentive Adversarial Spatiotemporal Network for Cross-Domain Few-Shot Action Recognition-R2. ...
doi:10.1109/tip.2022.3142569
fatcat:z26yhwuecbgrnb2czhwjlf73qu
Multimodal Machine Learning: A Survey and Taxonomy
[article]
2017
arXiv
pre-print
It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential. ...
We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and ...
, action recognition, audio-visual speech recognition, and semantic similarity estimation. ...
arXiv:1705.09406v2
fatcat:262fo4sihffvxecg4nwsifoddm
Weakly Supervised Action Labeling in Videos under Ordering Constraints
[chapter]
2014
Lecture Notes in Computer Science
We formulate the problem as a weakly supervised temporal assignment with ordering constraints. ...
We evaluate the proposed model on a new and challenging dataset of 937 video clips with a total of 787720 frames containing sequences of 16 different actions from 69 Hollywood movies. ...
Introduction Significant progress towards action recognition in realistic video settings has been achieved in the past few years [22, 24, 26, 30, 35] . ...
doi:10.1007/978-3-319-10602-1_41
fatcat:4amyqqud6ncm3fdwnkwuksiho4
2020 Index IEEE Transactions on Image Processing Vol. 29
2020
IEEE Transactions on Image Processing
., +, TIP 2020 2150-2165
FAMED-Net: A Fast and Accurate Multi-Scale End-to-End Dehazing Net-
work. Zhang, J., +, TIP 2020 72-84
Few-Shot Text Style Transfer via Deep Feature Similarity. ...
., +, TIP 2020 8028-8042 A Two-Stage Approach to Few-Shot Learning for Image Recognition. An Unordered Image Stitching Method Based on Binary Tree and Estimated Overlapping Area. ...
doi:10.1109/tip.2020.3046056
fatcat:24m6k2elprf2nfmucbjzhvzk3m
Emotion Recognition from Multiple Modalities: Fundamentals and Methodologies
[article]
2021
arXiv
pre-print
We begin with a brief introduction on widely used emotion representation models and affective modalities. ...
., recognizing, interpreting, processing, and simulating emotions, is becoming increasingly important. In this tutorial, we discuss several key aspects of multi-modal emotion recognition (MER). ...
As such, designing effective algorithms for unsupervised/weaklysupervised learning and few/zero shot learning can provide potential solutions. ...
arXiv:2108.10152v1
fatcat:hwnq7hoiqba3pdf6aakcxjq33i
Weakly Supervised Action Labeling in Videos Under Ordering Constraints
[article]
2014
arXiv
pre-print
We formulate the problem as a weakly supervised temporal assignment with ordering constraints. ...
We evaluate the proposed model on a new and challenging dataset of 937 video clips with a total of 787720 frames containing sequences of 16 different actions from 69 Hollywood movies. ...
ERC grants ALLEGRO, VideoWorld, Activia and Sierra. ...
arXiv:1407.1208v1
fatcat:gcebvzflhbdytfj37txqjy42bi
A Survey of Content-Based Video Retrieval
2008
Journal of Computer Science
The major themes covered by the study include shot segmentation, key frame extraction, feature extraction, clustering, indexing and video retrieval-by similarity, probabilistic, transformational, refinement ...
and relevance feedback. ...
[69] employed dynamic programming to align two video sequences of different temporal length. ...
doi:10.3844/jcssp.2008.474.486
fatcat:ntongqzelvdbtanlz4tuerzreq
« Previous
Showing results 1 — 15 out of 3,128 results