Filters








575 Hits in 6.2 sec

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion [article]

Jinpeng Wang, Yuting Gao, Ke Li, Jianguo Hu, Xinyang Jiang, Xiaowei Guo, Rongrong Ji, Xing Sun
2020 arXiv   pre-print
One significant factor we expect the video representation learning to capture, especially in contrast with the image representation learning, is the object motion.  ...  Compared to the original video, the positive/negative is motion-untouched/broken but scene-broken/untouched by Spatial Local Disturbance and Temporal Local Disturbance.  ...  Our contributions are summarized as follows: • We formulate the self-supervised video representation learning into a data-driven metric learning, and decouple the scene and the motion to alleviate the  ... 
arXiv:2009.05757v3 fatcat:6yflipw64nctdlr67nf46h7g4y

Learning Structured Representations of Spatial and Interactive Dynamics for Trajectory Prediction in Crowded Scenes [article]

Todor Davchev, Michael Burke, Subramanian Ramamoorthy
2020 arXiv   pre-print
and new tasks by decoupling per-agent dynamics and environment modelling.  ...  This work proposes a modular method that utilises a learned model of the environment for motion prediction and explicitly allows for unsupervised adaptation of trajectory prediction models to unseen environments  ...  ACKNOWLEDGMENT The authors would like to thank A. Srivastava, J. Viereck and M. Asenov for the valuable comments on earlier drafts.  ... 
arXiv:1911.13044v5 fatcat:if7rcac2ircxri7tu2fl4ffcfq

Unsupervised Motion Representation Enhanced Network for Action Recognition [article]

Xiaohang Yang, Lingtong Kong, Jie Yang
2021 arXiv   pre-print
Learning reliable motion representation between consecutive frames, such as optical flow, has proven to have great promotion to video understanding.  ...  Compared with state-of-the-art unsupervised motion representation learning methods, our model achieves better accuracy while maintaining efficiency, which is competitive with some supervised or more complicated  ...  action that is not restricted by the scene.  ... 
arXiv:2103.03465v1 fatcat:udht4fdnpfhrlpuwd6qkqlo76u

Static and Dynamic Concepts for Self-supervised Video Representation Learning [article]

Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin
2022 arXiv   pre-print
In this paper, we propose a novel learning scheme for self-supervised video representation learning.  ...  Motivated by how humans understand videos, we propose to first learn general visual concepts then attend to discriminative local areas for video understanding.  ...  But in unsupervised video representation learning, how to formulate meaningful visual concepts and efficiently leverage local cues remains unsolved.  ... 
arXiv:2207.12795v1 fatcat:yuzh55dslzhqldveyyxwdk7dmu

Future Segmentation Using 3D Structure [article]

Suhani Vora, Reza Mahjourian, Soeren Pirk, Anelia Angelova
2018 arXiv   pre-print
Working towards this capability, we address the task of predicting future frame segmentation from a stream of monocular video by leveraging the 3D structure of the scene.  ...  Our framework is based on learnable sub-modules capable of predicting pixel-wise scene semantic labels, depth, and camera ego-motion of adjacent frames.  ...  We further plan to extend this work and demonstrate its effectiveness, by predicting future events for better motion planning, e.g. in the context of human-robot interaction.  ... 
arXiv:1811.11358v1 fatcat:tbjpzays6fchznx6kekjcd3gyu

Human Mesh Recovery from Monocular Images via a Skeleton-disentangled Representation [article]

Sun Yu, Ye Yun, Liu Wu, Gao Wenpeng, Fu YiLi, Mei Tao
2019 arXiv   pre-print
Furthermore, an unsupervised adversarial training strategy, temporal shuffles and order recovery, is designed to promote the learning of motion dynamics.  ...  The proposed method outperforms the state-of-the-art 3D human mesh recovery methods by 15.4% MPJPE and 23.8% PA-MPJPE on Human3.6M.  ...  An unsupervised adversarial training strategy is proposed to guide the representation learning of motion dynamics in the video. 4. Our entire framework is trained in an end-to-end manner.  ... 
arXiv:1908.07172v2 fatcat:k4ncnboleza4vc6jh3y355kgum

Unsupervised Learning of Accurate Camera Pose and Depth from Video Sequences with Kalman Filter

Yan Wang, Yu-Fan Xu
2019 IEEE Access  
INDEX TERMS Unsupervised learning, dense depth recovery, ego-motion, Kalman filter, decoupled, upsampling module.  ...  Most unsupervised methods for scene perception tasks (e.g., dense depth recovery and ego-motion estimation) train the convolutional network via minimizing the photometric error of images, achieving very  ...  LEARNING FRAMEWORK In order to learn a representation that encodes the longterm motion and depth map dependencies in videos, we cast the learning framework as a sequence-to-sequence problem.  ... 
doi:10.1109/access.2019.2903871 fatcat:xtvub46odrbufkhbaszkk7vt6m

Deep Learning Algorithms with Applications to Video Analytics for A Smart City: A Survey [article]

Li Wang, Dennis Sng
2015 arXiv   pre-print
It aims to learn hierarchical representations of data by using deep architecture models.  ...  and scene labeling.  ...  The ROSE Lab is supported by the National Research Foundation, Prime Ministers Office, Singapore, under its IDM Futures Funding Initiative and administered by the Interactive and Digital Media Programme  ... 
arXiv:1512.03131v1 fatcat:kavsqti6nvh6lnkz62tk7adtu4

Motion-aware Contrastive Video Representation Learning via Foreground-background Merging [article]

Shuangrui Ding, Maomao Li, Tianyu Yang, Rui Qian, Haohang Xu, Qingyi Chen, Jue Wang, Hongkai Xiong
2022 arXiv   pre-print
In light of the success of contrastive learning in the image domain, current self-supervised video representation learning methods usually employ contrastive loss to facilitate video representation learning  ...  By leveraging the semantic consistency between the original clips and the fused ones, the model focuses more on the motion patterns and is debiased from the background shortcut.  ...  , and in part by the Program of Shanghai Science and Technology Innovation Project under Grant 20511100100.  ... 
arXiv:2109.15130v3 fatcat:qy2voj5fxfbl3dla7svjpp54l4

Unsupervised Person Re-Identification with Wireless Positioning under Weak Scene Labeling [article]

Yiheng Liu, Wengang Zhou, Qiaokang Xie, Houqiang Li
2021 arXiv   pre-print
To this end, we propose to explore unsupervised person re-identification with both visual data and wireless positioning trajectories under weak scene labeling, in which we only need to know the locations  ...  MMDA explores potential data associations in unlabeled multimodal data, while MMGN propagates multimodal messages in the video graph based on the adjacency matrix learned from histogram statistics of wireless  ...  To enhance the representation capability of features, similar to [46] , [47] , we use the multi-head mechanism to concatenate features generated by 6 MGMs to achieve the final representation.  ... 
arXiv:2110.15610v1 fatcat:x2rog3tar5cb3njmwwqgugdz3e

Unsupervised Monocular Depth Estimation of Driving Scenes Using Siamese Convolutional LSTM Networks

John Paul Tan Yusiong, Prospero Clara Naval, Jr.
2020 International Journal of Innovative Computing, Information and Control  
In this paper, we present a deep learning model to simultaneously learn and refine depth maps from a single RGB image and in an endto-end manner by casting the monocular depth estimation as an image reconstruction  ...  Estimating depth from a single RGB image is an active research topic in computer vision because of its broad applications in scene understanding, autonomous driving, and traffic surveillance systems.  ...  Unsupervised learning of depth from monocular video sequences.  ... 
doi:10.24507/ijicic.16.01.91 fatcat:cuqzec7jffemrl6praqq4gwo7u

2021 Index IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43

2022 IEEE Transactions on Pattern Analysis and Machine Intelligence  
The primary entry includes the coauthors' names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, month, and inclusive pagination.  ...  The Subject Index contains entries describing the item under all appropriate subject headings, plus the first author's name, the publication abbreviation, month, and year, and inclusive pages.  ...  ., +, TPAMI May 2021 1605-1619 MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement.  ... 
doi:10.1109/tpami.2021.3126216 fatcat:h6bdbf2tdngefjgj76cudpoyia

Higher-Order Conditional Random Field established with CNNs for Video Object Segmentation

2021 KSII Transactions on Internet and Information Systems  
We perform the task of video object segmentation by incorporating a conditional random field (CRF) and convolutional neural networks (CNNs).  ...  Others treat the inference process of the CRF as a recurrent neural network and then combine CNNs and the CRF into an end-to-end model for video object segmentation.  ...  Related Work Unsupervised VOS Unsupervised approaches do not require labeled data to be entered and automatically remove the object of interest from the video.  ... 
doi:10.3837/tiis.2021.09.007 fatcat:3t7cz43mibcprpy6k7dmy6bsku

2021 Index IEEE Transactions on Image Processing Vol. 30

2021 IEEE Transactions on Image Processing  
The primary entry includes the coauthors' names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, month, and inclusive pagination.  ...  The Subject Index contains entries describing the item under all appropriate subject headings, plus the first author's name, the publication abbreviation, month, and year, and inclusive pages.  ...  Hsu, H., +, TIP 2021 5198-5210 Multi-Task Learning Framework for Motion Estimation and Dynamic Scene Deblurring.  ... 
doi:10.1109/tip.2022.3142569 fatcat:z26yhwuecbgrnb2czhwjlf73qu

Front Matter: Volume 12084

Wolfgang Osten, Dmitry Nikolaev, Jianhong Zhou
2022 Fourteenth International Conference on Machine Vision (ICMV 2021)  
The papers reflect the work and thoughts of the authors and are published herein as submitted.  ...  The publisher is not responsible for the validity of the information or for any outcomes resulting from reliance thereon.  ...  deep learning [12084-50] 1E Joint alignment and compactness learning for multi-source unsupervised domain adaptation [12084-52] 1F Deep neural networks for moving object classification in video surveillance  ... 
doi:10.1117/12.2625908 fatcat:zrgauhqj7ng65flfcmbhqx2ony
« Previous Showing results 1 — 15 out of 575 results