Filters








518 Hits in 6.0 sec

Bi-Calibration Networks for Weakly-Supervised Video Representation Learning [article]

Fuchen Long and Ting Yao and Zhaofan Qiu and Xinmei Tian and Jiebo Luo and Tao Mei
2022 arXiv   pre-print
In this paper, we introduce a new design of mutual calibration between query and text to boost weakly-supervised video representation learning.  ...  Two large-scale web video datasets paired with query and title for each video are newly collected for weakly-supervised video representation learning, which are named as YOVO-3M and YOVO-10M, respectively  ...  BI-CALIBRATION NETWORKS In this section, we introduce the Bi-Calibration Networks (BCN) that performs mutual calibration between query and text to facilitate weakly-supervised video representation learning  ... 
arXiv:2206.10491v1 fatcat:t3ag63loxfenjiwolhevolnqum

2021 Index IEEE Transactions on Image Processing Vol. 30

2021 IEEE Transactions on Image Processing  
The Author Index contains the primary entry for each item, listed under the first author's name.  ...  ., +, TIP 2021 7038-7049 Complex networks Multi-Hierarchical Category Supervision for Weakly-Supervised Temporal Action Localization.  ...  Bi, X., +, TIP 2021 7228-7240 3D Object Representation Learning: A Set-to-Set Matching Perspective.  ... 
doi:10.1109/tip.2022.3142569 fatcat:z26yhwuecbgrnb2czhwjlf73qu

Self-supervised Learning for Semi-supervised Temporal Language Grounding [article]

Fan Luo, Shaoxiang Chen, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
2021 arXiv   pre-print
feature learning module with inter-modal and intra-modal contrastive losses to learn video feature representations under the constraints of video content consistency and video-text alignment.  ...  Previous works either tackle this task in a fully-supervised setting that requires a large amount of temporal annotations or in a weakly-supervised setting that usually cannot achieve satisfactory performance  ...  Weakly supervised alignment network for weakly-supervised video moment re- dense event captioning in videos.  ... 
arXiv:2109.11475v2 fatcat:2qmfaum4off4dmxzbvgpgj2hty

IEEE Access Special Section Editorial: Biologically Inspired Image Processing Challenges and Future Directions

Jiachen Yang, Qinggang Meng, Maurizio Murroni, Shiqi Wang, Feng Shao
2020 IEEE Access  
In the article, ''Hide-CAM: Finding multiple discriminative regions in weakly supervised location,'' by Xu et al., the authors propose a weakly supervised localization scheme by hide strategy with the  ...  for data-driven spatiotemporal feature representation combining a multi-scale feature representation scheme and a frame skipping strategy.  ... 
doi:10.1109/access.2020.3015372 fatcat:styxiguqlnaprclkiamogmjc24

A Survey on Temporal Sentence Grounding in Videos [article]

Xiaohan Lan, Yitian Yuan, Xin Wang, Zhi Wang, Wenwu Zhu
2021 arXiv   pre-print
More specifically, we first discuss existing TSGV approaches by grouping them into four categories, i.e., two-stage methods, end-to-end methods, reinforcement learning-based methods, and weakly supervised  ...  Meanwhile, TSGV is more challenging since it requires both textual and visual understanding for semantic alignment between two modalities(i.e., text and video).  ...  The performance of DiDeMo for weakly supervised methods will be presented later.  ... 
arXiv:2109.08039v2 fatcat:6ja4csssjzflhj426eggaf77tu

2020 Index IEEE Transactions on Circuits and Systems for Video Technology Vol. 30

2020 IEEE transactions on circuits and systems for video technology (Print)  
Beyond Weakly Supervised: Pseudo Ground Truths Mining for Missing Bounding-Boxes Object Detection.  ...  ., +, TCSVT April 2020 1037-1050 Beyond Weakly Supervised: Pseudo Ground Truths Mining for Missing Bounding-Boxes Object Detection.  ...  A Memory-Efficient Hardware Architecture for Connected Component Labeling in Embedded System.  ... 
doi:10.1109/tcsvt.2020.3043861 fatcat:s6z4wzp45vfflphgfcxh6x7npu

Real-time Deep Dynamic Characters [article]

Marc Habermann, Lingjie Liu, Weipeng Xu, Michael Zollhoefer, Gerard Pons-Moll, Christian Theobalt
2021 arXiv   pre-print
During training, we do not need to resort to difficult dynamic 3D capture of the human; instead we can train our model entirely from multi-view video in a weakly supervised manner.  ...  We propose a deep videorealistic 3D human character model displaying highly realistic shape, motion, and dynamic appearance learned in a new weakly supervised way from multi-view imagery.  ...  Weakly Supervised Losses.  ... 
arXiv:2105.01794v1 fatcat:q34njck5pffd3lryvzk2psxmoq

Weakly But Deeply Supervised Occlusion-Reasoned Parametric Road Layouts [article]

Buyu Liu, Bingbing Zhuang, Manmohan Chandraker
2022 arXiv   pre-print
We demonstrate how our design choices and proposed deep supervision help achieve meaningful representations and accurate predictions.  ...  In contrast to prior works that require dense supervision such as semantic labels in perspective view, our method only requires human annotations for parametric attributes that are cheaper and less ambiguous  ...  Supervision Required KITTI [9] Method Parametric Depth Semantics Simulated Video+Object Accu.-Bi. ↑ Accu.  ... 
arXiv:2104.06730v2 fatcat:26byrsalnnbippzxvjs6cye2ma

Table of Contents

2021 2021 IEEE/CVF International Conference on Computer Vision (ICCV)  
Directive Network for Weakly Supervised Salient Object Detection Faming Fang (East China Normal University), Guixu Zhang (East China Ling Shao (Inception Institute of Artificial Intelligence) Chuanjun  ...  Hard and Soft Shadow Removal Using Unsupervised Gaussian Fusion: Accurate 3D Reconstruction via Geometry-Guided Displacement Interpolation Spatio-Temporal Self-Supervised Representation Learning for 3D  ... 
doi:10.1109/iccv48922.2021.00004 fatcat:fkkjyeu27nex5idocgv7ljgiqi

Automatic Gaze Analysis: A Survey of Deep Learning based Approaches [article]

Shreya Ghosh, Abhinav Dhall, Munawar Hayat, Jarrod Knibbe, Qiang Ji
2022 arXiv   pre-print
We analyze recent gaze estimation and segmentation methods, especially in the unsupervised and weakly supervised domain, based on their advantages and reported evaluation metrics.  ...  Our analysis shows that the development of a robust and generic gaze analysis method still needs to address real-world challenges such as unconstrained setup and learning with less supervision.  ...  For learning paradigms with less supervision, the important methods are described in detail below: Weakly-supervised and Learning from Pseudo Labels.  ... 
arXiv:2108.05479v3 fatcat:6qhwjojyqbdctjcwnjerflvyzi

To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression

Yitian Yuan, Tao Mei, Wenwu Zhu
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Specifically, to preserve the context information, ABLR first encodes both video and sentence via Bi-directional LSTM networks.  ...  The former reflects the global video structure, while the latter highlights the sentence details for temporal localization.  ...  Jun Xu, Linjun Zhou and Xumin Chen for their great supports and valuable suggestions on this work.  ... 
doi:10.1609/aaai.v33i01.33019159 fatcat:t5ckqvm4kre5njg4m6m2uh5z7m

To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression [article]

Yitian Yuan, Tao Mei, Wenwu Zhu
2018 arXiv   pre-print
for temporal localization.  ...  Specifically, to preserve the context information, ABLR first encodes both video and sentence via Bidirectional LSTM networks.  ...  Jun Xu, Linjun Zhou and Xumin Chen for their great supports and valuable suggestions on this work.  ... 
arXiv:1804.07014v4 fatcat:7ngjxiv3kfgzhfemkf2pxe3pzq

2020 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 28

2020 IEEE/ACM Transactions on Audio Speech and Language Processing  
Padi, B., +, TASLP 2020 1223-1232 Video signal processing Weakly Supervised Representation Learning for Audio-Visual Scene Analysis.  ...  Emura, S., TASLP 2020 144-156 Weakly Supervised Representation Learning for Audio-Visual Scene Analy- sis.  ...  T Target tracking Multi-Hypothesis Square-Root Cubature Kalman Particle Filter for Speaker Tracking in Noisy and Reverberant Environments. Zhang, Q., +, TASLP 2020 1183 -1197  ... 
doi:10.1109/taslp.2021.3055391 fatcat:7vmstynfqvaprgz6qy3ekinkt4

A Survey of Human Action Recognition and Posture Prediction

Nan Ma, Zhixuan Wu, Yiu-ming Cheung, Yuchen Guo, Yue Gao, Jiahong Li, Beijyan Jiang
2022 Tsinghua Science and Technology  
Human action recognition and posture prediction aim to recognize and predict respectively the action and postures of persons in videos.  ...  In the past decade, tremendous progress has been made in the field, especially after the emergence of deep learning technologies.  ...  Therefore, in recent years, weakly supervised learning has been successfully exploited for recognition in untrimmed videos [110, 180] . (3) Interaction for action recognition.  ... 
doi:10.26599/tst.2021.9010068 fatcat:lygnvsm3unddnngyd7s3wkchjy

Recent Advances in Embedding Methods for Multi-Object Tracking: A Survey [article]

Gaoang Wang, Mingli Song, Jenq-Neng Hwang
2022 arXiv   pre-print
With the advancement of deep neural networks and the increasing demand for intelligent video analysis, MOT has gained significantly increased interest in the computer vision community.  ...  Multi-object tracking (MOT) aims to associate target objects across video frames in order to obtain entire moving trajectories.  ...  Typically, weakly supervised learning use the weak label in the annotation for each sample, while semi-supervised learning combines labels and unlabels samples in training.  ... 
arXiv:2205.10766v1 fatcat:p7s7lnnlsnadrhsdcmwlg7msfy
« Previous Showing results 1 — 15 out of 518 results