Filters








2,414 Hits in 4.3 sec

Weakly Supervised Temporal Action Localization Using Deep Metric Learning [article]

Ashraful Islam, Richard J. Radke
2020 arXiv   pre-print
To this end, we propose a weakly supervised temporal action localization method that only requires video-level action instances as supervision during training.  ...  We propose a classification module to generate action labels for each segment in the video, and a deep metric learning module to learn the similarity between different action instances.  ...  Most of these techniques use temporal annotations during training, while we aim to use only video-level labels for action localization. Deep Metric Learning.  ... 
arXiv:2001.07793v1 fatcat:qzhzxvtzkva5ziqxyfzgvg36pi

A Survey on Temporal Action Localization

Huifen Xia, Yongzhao Zhan
2020 IEEE Access  
In addition, we summarize temporal action localization from two aspects: fully-supervised learning and weakly-supervised learning.  ...  Temporal action localization has made some significant progress, especially with the development of deep learning recently. And more demand is for temporal action localization in untrimmed videos.  ...  Then we review the recent developments of temporal action localization from fullysupervised learning to weakly-supervised learning methods.  ... 
doi:10.1109/access.2020.2986861 fatcat:rsndgkzhi5fm5l6nmfmgfvugby

Audio-Visual Event Localization in Unconstrained Videos [chapter]

Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, Chenliang Xu
2018 Lecture Notes in Computer Science  
We collect an Audio-Visual Event (AVE) dataset to systemically investigate three temporal localization tasks: supervised and weakly-supervised audio-visual event localization, and cross-modality localization  ...  Our experiments support the following findings: joint modeling of auditory and visual modalities outperforms independent modeling, the learned attention can capture semantics of sounding objects, temporal  ...  ., Tencent and the support of NVIDIA Corporation with the donation of the GPUs used for this research.  ... 
doi:10.1007/978-3-030-01216-8_16 fatcat:t4dbgoypsnaixmrtrrtrwcv2sy

Abnormal event detection by a weakly supervised temporal attention network

Xiangtao Zheng, Yichao Zhang, Yunpeng Zheng, Fulin Luo, Xiaoqiang Lu
2021 CAAI Transactions on Intelligence Technology  
In this study, a Temporal Attention Network (TANet) is proposed to capture both the specific categories and temporal locations of abnormal events in a weakly supervised manner.  ...  An event recognition module is exploited to predict the event scores for each video segment while a temporal attention module is proposed to learn a temporal attention value.  ...  Islam and Radke [42] proposed a balanced binary cross-entropy loss and a metric loss to learn discriminative action representations for weakly supervised temporal action localization.  ... 
doi:10.1049/cit2.12068 fatcat:tv72n5lzyzezdi3emajgwiv7ja

Weak Supervision and Referring Attention for Temporal-Textual Association Learning [article]

Zhiyuan Fang, Shu Kong, Zhe Wang, Charless Fowlkes, Yezhou Yang
2020 arXiv   pre-print
Therefore we provide a Weak-Supervised alternative with our proposed Referring Attention mechanism to learn temporal-textual association (dubbed WSRA).  ...  We validate our WSRA through extensive experiments for temporally grounding by languages, demonstrating that it outperforms the state-of-the-art weakly-supervised methods notably.  ...  Weakly-Supervised Action Localization can be thought of a specific example of weakly supervised video learning with the video-level labels [52, 43, 32, 36] .  ... 
arXiv:2006.11747v2 fatcat:bpqa6chthfgjhatmsgqq5t2dym

Deep Learning-based Action Detection in Untrimmed Videos: A Survey [article]

Elahe Vahdani, Yingli Tian
2021 arXiv   pre-print
This paper provides an extensive overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos with different supervision levels including fully-supervised, weakly-supervised  ...  Moreover, the commonly used action detection benchmark datasets and evaluation metrics are described, and the performance of the state-of-the-art methods are compared.  ...  Weakly-supervised Action Detection Weakly-supervised learning scheme requires coarse-grained or noisy labels during the training phase.  ... 
arXiv:2110.00111v1 fatcat:ven4rijqmnbyxflrf6wyxfpex4

Weakly Supervised Object Localization and Detection: A Survey [article]

Dingwen Zhang, Junwei Han, Gong Cheng, Ming-Hsuan Yang
2021 arXiv   pre-print
and standard evaluation metrics that are widely used in this field.  ...  As an emerging and challenging problem in the computer vision community, weakly supervised object localization and detection plays an important role for developing new generation computer vision systems  ...  On the other hand, [107] , [132] , [136] , [166] develop methods to localize temporal actions in the given untrimmed videos, where the main goal is to predict the temporal boundary of each action  ... 
arXiv:2104.07918v1 fatcat:dwl6sjfzibdilnvjnrbifp4uke

Towards Train-Test Consistency for Semi-supervised Temporal Action Localization [article]

Xudong Lin, Zheng Shou, Shih-Fu Chang
2020 arXiv   pre-print
Recently, Weakly-supervised Temporal Action Localization (WTAL) has been densely studied but there is still a large gap between weakly-supervised models and fully-supervised models.  ...  The inconsistent strategy makes it hard to explicitly supervise the action localization model with temporal boundary annotations at training time.  ...  Weakly-supervised Temporal Action Localization Obtaining the temporal annotations for full supervision is still the bottleneck if we go to a larger scale.  ... 
arXiv:1910.11285v3 fatcat:2acircsxhjferlmolj5eu3i7mq

Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment [article]

Li Ding, Chenliang Xu
2018 arXiv   pre-print
In this work, we address the task of weakly-supervised human action segmentation in long, untrimmed videos.  ...  , and a novel training strategy for weakly-supervised sequence modeling, named Iterative Soft Boundary Assignment (ISBA), to align action sequences and update the network in an iterative fashion.  ...  This idea is inspired from taking random steps used in deep reinforcement learning.  ... 
arXiv:1803.10699v1 fatcat:pwrl3cfahvgstpbcfzfj7pynxu

AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization [article]

Xiao-Yu Zhang, Changsheng Li, Haichao Shi, Xiaobin Zhu, Peng Li, Jing Dong
2019 arXiv   pre-print
As a challenging problem for high-level video understanding, weakly supervised action recognition and localization in untrimmed videos has attracted intensive research attention.  ...  An encoder-decoder based structure is carefully designed and jointly optimized to facilitate effective action classification and temporal localization.  ...  In addition, RNN is also widely used for temporal action localization. Compared with fully supervised methods, weakly supervised action localization is less studied.  ... 
arXiv:1911.11961v1 fatcat:qxptpsr5djcd5nnpagxnqvnrpu

Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning [article]

Zhekun Luo, Devin Guillory, Baifeng Shi, Wei Ke, Fang Wan, Trevor Darrell, Huijuan Xu
2020 arXiv   pre-print
Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label.  ...  It can be solved under the Multiple Instance Learning (MIL) framework, where a bag (video) contains multiple instances (action segments).  ...  Related Work Weakly-Supervised Action Localization Weakly supervised action localization learns to localize activities inside videos when only action class labels are available.  ... 
arXiv:2004.00163v2 fatcat:gaendu464zaldljoiyhihrizvq

Audio-Visual Event Localization in Unconstrained Videos [article]

Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, Chenliang Xu
2018 arXiv   pre-print
We collect an Audio-Visual Event(AVE) dataset to systemically investigate three temporal localization tasks: supervised and weakly-supervised audio-visual event localization, and cross-modality localization  ...  Our experiments support the following findings: joint modeling of auditory and visual modalities outperforms independent modeling, the learned attention can capture semantics of sounding objects, temporal  ...  For supervised and weakly-supervised event localization, we use overall accuracy as an evaluation metric.  ... 
arXiv:1803.08842v1 fatcat:diaiqzbuivh5bpox35ccpdb3ii

Action Localization through Continual Predictive Learning [article]

Sathyanarayanan N. Aakur, Sudeep Sarkar
2020 arXiv   pre-print
This self-supervised framework is not complicated as other approaches but is very effective in learning robust visual representations for both labeling and localization.  ...  In this paper, we present a new approach based on continual learning that uses feature-level predictions for self-supervision.  ...  Unsupervised action localization approaches have not been explored to the same extent as supervised and weakly-supervised approaches.  ... 
arXiv:2003.12185v1 fatcat:sb5t6r2tnberdd6crdl3pejoke

Weakly Supervised Temporal Adjacent Network for Language Grounding [article]

Yuechen Wang, Jiajun Deng, Wengang Zhou, Houqiang Li
2021 arXiv   pre-print
To this end, we introduce a novel weakly supervised temporal adjacent network (WSTAN) for temporal language grounding.  ...  In this work, we are dedicated to weakly supervised TLG, where multiple description sentences are given to an untrimmed video without temporal boundary labels.  ...  Therefore, the idea is applicable to weakly supervised spatial/temporal grounding problems, e.g. spatial visual grounding, temporal action localization, and semantic segmentation.  ... 
arXiv:2106.16136v1 fatcat:i7nwetztpraf5bjwxfrn2gpa2a

RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization [article]

Alejandro Pardo, Humam Alwassel, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem
2020 arXiv   pre-print
In this paper, we propose RefineLoc, a novel weakly-supervised temporal action localization method.  ...  RefineLoc shows competitive results with the state-of-the-art in weakly-supervised temporal localization.  ...  Weakly-supervised Temporal Action Localization.  ... 
arXiv:1904.00227v3 fatcat:cdiabbsi25ffdgouvqmgqssxqm
« Previous Showing results 1 — 15 out of 2,414 results