Filters








24,171 Hits in 6.0 sec

Self-Supervised Learning to Detect Key Frames in Videos

Xiang Yan, Syed Zulqarnain Gilani, Mingtao Feng, Liang Zhang, Hanlin Qin, Ajmal Mian
2020 Sensors  
To overcome these problems, we propose an automatic self-supervised method for detecting key frames in a video.  ...  Our method comprises a two-stream ConvNet and a novel automatic annotation architecture able to reliably annotate key frames in a video for self-supervised learning of the ConvNet.  ...  In this paper, we propose a self supervised method that learns to automatically detect key frames in videos. The proposed method has two main parts.  ... 
doi:10.3390/s20236941 pmid:33291759 fatcat:pqiovyqo2baa5cyq37os7hxjmy

Learning to Detect and Retrieve Objects from Unlabeled Videos [article]

Elad Amrani, Rami Ben-Ari, Tal Hakim, Alex Bronstein
2019 arXiv   pre-print
In this work, we propose to exploit the natural correlation in narrations and the visual presence of objects in video, to learn an object detector and retrieval without any manual labeling involved.  ...  We pose the problem as weakly supervised learning with noisy labels, and propose a novel object detection paradigm under these constraints.  ...  Since our task is self-supervised, we allowed frames from the same video to participate in both train and test sets.  ... 
arXiv:1905.11137v2 fatcat:myopymuc2rclnnw7tuh2k2siy4

Self-Supervised Learning for Robust Video Indexing

Ralph Ewerth, Bernd Freisleben
2006 2006 IEEE International Conference on Multimedia and Expo  
In this paper, we propose to use a novel self-supervised learning framework for robust video indexing to address this issue.  ...  Experimental results show that a state-of-the-art video cut detection approach can be significantly improved by the self-supervised learning approach.  ...  SELF-SUPERVISED LEARNING FOR ROBUST VIDEO INDEXING The key idea of the proposed approach is to use a robust baseline classifier for a given video X to automatically generate training data from the video  ... 
doi:10.1109/icme.2006.262889 dblp:conf/icmcs/EwerthF06 fatcat:purpr6utmbeqnlw2t3sgmzkz3a

Temporal Cycle-Consistency Learning

Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
The resulting per-frame embeddings can be used to align videos by simply matching frames using nearest-neighbors in the learned embedding space.  ...  Abstract We introduce a self-supervised representation learning method based on the task of temporal alignment between videos.  ...  Acknowledgements: We would like to thank Anelia Angelova, Relja Arandjelović, Sergio Guadarrama, Shefali Umrania, and Vincent Vanhoucke for their feedback on the manuscript.  ... 
doi:10.1109/cvpr.2019.00190 dblp:conf/cvpr/DwibediATSZ19 fatcat:4vytz5nx25djdmrkgljvjzsila

PS-DeVCEM: Pathology-sensitive deep learning model for video capsule endoscopy based on weakly labeled data

Ahmed Mohammed, Ivar Farup, Marius Pedersen, Sule Yildirim, Øistein Hovde
2020 Computer Vision and Image Understanding  
We propose a novel pathology-sensitive deep learning model (PS-DeVCEM) for frame-level anomaly detection and multi-label classification of different colon diseases in video capsule endoscopy (VCE) data  ...  Moreover, we developed a self-supervision method to maximize the distance between classes of pathologies.  ...  CAPSULE-AI3D Improved Pathology Detection in Wireless Capsule Endoscopy Images through Artificial Intelligence and 3D Reconstruction, project no. 300031''.  ... 
doi:10.1016/j.cviu.2020.103062 fatcat:vzje5tht4zfhfejagdgaujsc34

PreViTS: Contrastive Pretraining with Video Tracking Supervision [article]

Brian Chen, Ramprasaath R. Selvaraju, Shih-Fu Chang, Juan Carlos Niebles, Nikhil Naik
2021 arXiv   pre-print
Videos are a rich source for self-supervised learning (SSL) of visual representations due to the presence of natural temporal transformations of objects.  ...  PreViTS further uses the tracking signal to spatially constrain the frame regions to learn from and trains the model to locate meaningful objects by providing supervision on Grad-CAM attention maps.  ...  Going beyond learning representations from images, different frames of videos provide natural viewpoint changes and temporal information which can help learn better representations in a self-supervised  ... 
arXiv:2112.00804v1 fatcat:ywc7rx65zvcyjkd3xvaw3jhc34

Learning Transferable Self-attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision [article]

Xiao-Yu Zhang, Haichao Shi, Changsheng Li, Kai Zheng, Xiaobin Zhu, Lixin Duan
2019 arXiv   pre-print
In this paper, given only video-level annotations, we propose a novel weakly supervised framework to simultaneously locate action frames as well as recognize actions in untrimmed videos.  ...  In order to learn robust models, previous methods usually assume videos are trimmed as short sequences and require ground-truth annotations of each video frame/sequence, which is quite costly and time-consuming  ...  , TSRNet is able to obtain self-attention weights at frame levels, so that frames with higher weights can be selected out for the purpose of temporal action localization/detection in videos. • Extensive  ... 
arXiv:1902.07370v1 fatcat:ajhytmeyuran7kl4rnl35vzejy

Microwave Meta-lens for Generating Polarization-Independent refracted waves

Kuang Zhang, Yueyi Yuan, Yuxiang Wang, Xumin Ding, Qun Wu
2019 2019 IEEE MTT-S International Wireless Symposium (IWS)  
In this paper, given only video-level annotations, we propose a novel weakly supervised framework to simultaneously locate action frames as well as recognize actions in untrimmed videos.  ...  In order to learn robust models, previous methods usually assume videos are trimmed as short sequences and require ground-truth annotations of each video frame/sequence, which is quite costly and time-consuming  ...  , TSRNet is able to obtain self-attention weights at frame levels, so that frames with higher weights can be selected out for the purpose of temporal action localization/detection in videos. • Extensive  ... 
doi:10.1109/ieee-iws.2019.8803917 fatcat:fjlma6kjhrbrboryebrofssulm

Oops! Predicting Unintentional Action in Video [article]

Dave Epstein, Boyuan Chen, Carl Vondrick
2019 arXiv   pre-print
We also investigate self-supervised representations that leverage natural signals in our dataset, and show the effectiveness of an approach that uses the intrinsic speed of video to perform competitively  ...  We train a supervised neural network as a baseline and analyze its performance compared to human consistency on the tasks.  ...  First, we propose a novel self-supervised task to learn to predict the speed of video, which is incidental supervision available in all unlabeled video, for learning an action representation.  ... 
arXiv:1911.11206v1 fatcat:gdiefmoe55eijah5h26tgxjoym

Exploring Relations in Untrimmed Videos for Self-Supervised Learning [article]

Dezhao Luo, Bo Fang, Yu Zhou, Yucan Zhou, Dayan Wu, Weiping Wang
2020 arXiv   pre-print
In this paper, we propose a novel self-supervised method, referred to as Exploring Relations in Untrimmed Videos (ERUV), which can be straightforwardly applied to untrimmed videos (real unlabeled) to learn  ...  In this sense, these methods are not really self-supervised.  ...  This mechanism has been explored by [23] in image object detection, while we extend it with a self-supervised manner to model the relations between video clips.  ... 
arXiv:2008.02711v1 fatcat:zexhmmtazffyjepddxasb2dfoa

MAST: A Memory-Augmented Self-supervised Tracker [article]

Zihang Lai, Erika Lu, Weidi Xie
2020 arXiv   pre-print
comparable to supervised methods.  ...  When measuring generalizability, we show self-supervised approaches are actually superior to the majority of supervised methods.  ...  The authors would like to thank Andrew Zisserman for helpful discussions, Olivia Wiles, Shangzhe Wu, Sophia Koepke and Tengda Han for proofreading.  ... 
arXiv:2002.07793v2 fatcat:hn6fof2ganfuldzbxkuvouckoq

Temporal Cycle-Consistency Learning [article]

Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
2019 arXiv   pre-print
of self-supervised learning in videos, such as Shuffle and Learn and Time-Contrastive Networks.  ...  The resulting per-frame embeddings can be used to align videos by simply matching frames using the nearest-neighbors in the learned embedding space.  ...  Acknowledgements: We would like to thank Anelia Angelova, Relja Arandjelović, Sergio Guadarrama, Shefali Umrania, and Vincent Vanhoucke for their feedback on the manuscript.  ... 
arXiv:1904.07846v1 fatcat:oytx6gdzgzbmnb527jt2csbnvi

Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions [article]

Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, Bryan Russell
2021 arXiv   pre-print
We introduce the task of weakly supervised learning for detecting human and object interactions in videos.  ...  To address these challenges, we introduce a contrastive weakly supervised training loss that aims to jointly associate spatiotemporal regions in a video with an action and object vocabulary and encourage  ...  For the features of all frames {x 1 , · · · , x T } in the same video, we use two different embedding layers to get "key" x key Region attended human/object feature learning.  ... 
arXiv:2110.03562v1 fatcat:bdco4w4jcrdrnn2wpzabxaemgi

Combining Supervised and Un-supervised Learning for Automatic Citrus Segmentation [article]

Heqing Huang, Tongbin Huang, Zhen Li, Zhiwei Wei, Shilei Lv
2021 arXiv   pre-print
Then, we extend a state-of-the-art unsupervised learning approach to pre-learn the citrus's potential movements between frames from unlabelled citrus's videos.  ...  In this paper, we first train a simple CNN with a small number of labelled citrus images in a supervised manner, which can roughly predict the citrus location from each frame.  ...  This section gives a brief introduction to the methods used in this paper, like video detection, self-supervised learning.(sect. 2.3 to sect. 2.4).  ... 
arXiv:2105.01553v1 fatcat:fjjsc7zpg5fxxc7gsgixqfi3mu

Semi-supervised and Deep learning Frameworks for Video Classification and Key-frame Identification [article]

Sohini Roychowdhury
2022 arXiv   pre-print
Also, clustering the video frames in the encoded feature space further isolates key-frames at cluster boundaries.  ...  In this work, we present two semi-supervised approaches that automate this process of manual frame sifting in video streams by automatically classifying scenes for content and filtering frames for fine-tuning  ...  Additionally, the recent work in [5] used self-supervised deep-learning representations to isolate key frames from targeted human action-specific tasks only.  ... 
arXiv:2203.13459v1 fatcat:n77ni7ojszh5dlipq7ectwd3um
« Previous Showing results 1 — 15 out of 24,171 results