Filters








18,777 Hits in 4.7 sec

Unsupervised Learning of Object Structure and Dynamics from Videos [article]

Matthias Minderer, Chen Sun, Ruben Villegas, Forrester Cole, Kevin Murphy, Honglak Lee
2020 arXiv   pre-print
Extracting and predicting object structure and dynamics from videos without supervision is a major challenge in machine learning.  ...  Our method improves upon unstructured representations both for pixel-level video prediction and for downstream tasks requiring object-level understanding of motion dynamics.  ...  In this work, we focus on unsupervised learning of object structure and dynamics from videos.  ... 
arXiv:1906.07889v3 fatcat:elss3ab5vnh2be77fud2nqdgmq

AutoTrajectory: Label-free Trajectory Extraction and Prediction from Videos using Dynamic Points [article]

Yuexin Ma, Xinge ZHU, Xinjing Cheng, Ruigang Yang, Jiming Liu, Dinesh Manocha
2020 arXiv   pre-print
To the best of our knowledge, our method is the first to achieve unsupervised learning of trajectory extraction and prediction.  ...  To better capture the moving objects in videos, we introduce dynamic points.  ...  Unsupervised Learning for Dynamic Modeling To extract trajectories from sequential frames, a crucial step is learning the motion dynamics of the video.  ... 
arXiv:2007.05719v1 fatcat:kdpkhkezqrgazjeg5mwicyowwa

Unsupervised Learning from Video with Deep Neural Embeddings [article]

Chengxu Zhuang, Tianwei She, Alex Andonian, Max Sobol Mark, Daniel Yamins
2020 arXiv   pre-print
Because of the rich dynamical structure of videos and their ubiquity in everyday life, it is a natural idea that video data could serve as a powerful unsupervised learning signal for training visual representations  ...  We show that VIE-trained networks substantially advance the state of the art in unsupervised learning from video datastreams, both for action recognition in the Kinetics dataset, and object recognition  ...  The general problem of unsupervised learning from videos can be formulated as learning a parameterized function φ θ (·) from input videos V = {v i |i = 1, 2, .., N }, where each v i con- sists of a sequence  ... 
arXiv:1905.11954v2 fatcat:vgkv2ivpjfba5erhi2j6yuguam

Learning similarity metrics for dynamic scene segmentation

Damien Teney, Matthew Brown, Dimitry Kit, Peter Hall
2015 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
This paper addresses the segmentation of videos with arbitrary motion, including dynamic textures, using novel motion features and a supervised learning approach.  ...  We also demonstrate the applicability of our approach to general object and motion segmentation, showing significant improvements over unsupervised segmentation and results comparable to the best task  ...  Motion cues from spatiotemporal filters We characterize texture and motion in the video using a bank of 3D, spatiotemporal filters [15, 10] , that help reveal structure in the video volume.  ... 
doi:10.1109/cvpr.2015.7298820 dblp:conf/cvpr/TeneyBKH15 fatcat:5lgmacmdvjfs5l7vzuapcae5lm

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning [article]

Ting Yao and Yiheng Zhang and Zhaofan Qiu and Yingwei Pan and Tao Mei
2021 arXiv   pre-print
A steady momentum of innovations and breakthroughs has convincingly pushed the limits of unsupervised image representation learning. Compared to static 2D images, video has one more dimension (time).  ...  We materialize the supervisory signals through determining whether a pair of samples is from one frame or from one video, and whether a triplet of samples is in the correct temporal order.  ...  The way elegantly takes the advantage of spatiotemporal structure within videos and thus strengthens the unsupervised visual feature learning for video understanding.  ... 
arXiv:2008.00975v2 fatcat:eb3hnhbuqfhjvhquax6oro4kei

Unsupervised learning of depth estimation, camera motion prediction and dynamic object localization from video

Delong Yang, Xunyu Zhong, Dongbing Gu, Xiafu Peng, Gongliu Yang, Chaosheng Zou
2020 International Journal of Advanced Robotic Systems  
This article presents a novel unsupervised deep learning framework for scene depth estimation, camera motion prediction and dynamic object localization from videos.  ...  Estimating scene depth, predicting camera motion and localizing dynamic objects from monocular videos are fundamental but challenging research topics in computer vision.  ...  Funding The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: ORCID iD Delong Yang https://orcid.org/0000-0001-8913-3886  ... 
doi:10.1177/1729881420909653 fatcat:psx7vi472bew5nrlxkzy5zfygi

Unsupervised Monocular Depth Learning in Dynamic Scenes [article]

Hanhan Li, Ariel Gordon, Hang Zhao, Vincent Casser, Anelia Angelova
2020 arXiv   pre-print
, and they tend to be constant for rigid moving objects.  ...  We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source  ...  Figure 2 : 2 Qualitative results of our unsupervised monocular depth and 3D object motion map learning in dynamic scenes across all datasets: Cityscapes, KITTI, Waymo Open Dataset and YouTube.  ... 
arXiv:2010.16404v2 fatcat:dwf7hypltnbijn3somdt6hv2zu

Adversarial Framework for Unsupervised Learning of Motion Dynamics in Videos [article]

C. Spampinato, S. Palazzo, P. D'Oro, D. Giordano, M. Shah
2019 arXiv   pre-print
Performance evaluation, carried out on standard benchmarks, shows that our approach is able to learn, in an unsupervised way, both local and global video dynamics.  ...  Unsupervised learning can instead leverage the vast amount of videos available on the web and it is a promising solution for overcoming the existing limitations.  ...  In this paper, we tackle both the problem of unsupervised learning for video object segmentation and that of video generation with disentangled background and foreground dynamics, combining both of them  ... 
arXiv:1803.09092v2 fatcat:tconl7knq5af3nqlbxthvx7br4

Digital Video Summarization Techniques: A Survey

Ashenafi Workie, Rajesh Sharma, Yun Koo Chung, Adama Science and technology university
2020 International Journal of Engineering Research and  
The main objective of Video summarization is to provide a clear analysis of the video by removing redundant and extracting key frames contents from the video.  ...  These techniques may fall into summarized, unsupervised and deep reinforcement learning approaches. Video representation categorized in static and dynamic summarization ways.  ...  Supervised Methods In a supervised learning approach video, summarization learns from labelled data by consisting of videos and along with ground-truth summary videos.  ... 
doi:10.17577/ijertv9is010026 fatcat:nng5pwivfbdgnmluzftqbsd5r4

Acquiring linguistic argument structure from multimodal input using attentive focus

G. Satish, Amitabha Mukerjee
2008 2008 7th IEEE International Conference on Development and Learning  
Using a computational model of dynamic attention, we present an algorithm that clusters visual events into action classes in an unsupervised manner using the Merge Neural Gas algorithm.  ...  We learn action schemas for linguistic units like "moving towards" or "chase", and validate our results by producing output commentaries for 3D video.  ...  Acknowledgements We are grateful to Barbara Tversky and her group for comments on an earlier draft (as well as the video and commentaries).  ... 
doi:10.1109/devlrn.2008.4640803 fatcat:lcbsjbtzejbd3kyrd6nlut22ue

Learning to Track Objects from Unlabeled Videos [article]

Jilai Zheng, Chao Ma, Houwen Peng, Xiaokang Yang
2021 arXiv   pre-print
First, we sample sequentially moving objects with unsupervised optical flow and dynamic programming, instead of random cropping.  ...  In this paper, we propose to learn an Unsupervised Single Object Tracker (USOT) from scratch.  ...  In view of the great success of unsupervised learning on a number of other vision tasks, such as video object segmentation [23] , optical flow [28] and depth estimation [14] , it is of great interest  ... 
arXiv:2108.12711v1 fatcat:47sebbdsevgjnbbm3cizuy7n6e

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose [article]

Zhichao Yin, Jianping Shi
2018 arXiv   pre-print
We propose GeoNet, a jointly unsupervised learning framework for monocular depth, optical flow and ego-motion estimation from videos.  ...  Specifically, geometric relationships are extracted over the predictions of individual modules and then combined as an image reconstruction loss, reasoning about static and dynamic scene parts separately  ...  Acknowledgements We would like to thank Guorun Yang and Tinghui Zhou for helpful discussions and sharing the code. We also thank the anonymous reviewers for their instructive comments.  ... 
arXiv:1803.02276v2 fatcat:rcqfxt53qzfehb2uiqoav7w42e

Self-Supervised Video Representation Learning With Odd-One-Out Networks [article]

Basura Fernando, Hakan Bilen, Efstratios Gavves, Stephen Gould
2017 arXiv   pre-print
We apply this technique to self-supervised video representation learning where we sample subsequences from videos and ask the network to learn to predict the odd video subsequence.  ...  In this task, the machine is asked to identify the unrelated or odd element from a set of otherwise related elements.  ...  Acknowledgement: This research was supported by the Australian Research Council (ARC) through the Centre of Excellence for Robotic Vision (CE140100016) and was undertaken on the NCI National Facility in  ... 
arXiv:1611.06646v4 fatcat:sgcm4lty5nhzdniqiensloxhim

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

Zhichao Yin, Jianping Shi
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
We propose GeoNet, a jointly unsupervised learning framework for monocular depth, optical flow and egomotion estimation from videos.  ...  Specifically, geometric relationships are extracted over the predictions of individual modules and then combined as an image reconstruction loss, reasoning about static and dynamic scene parts separately  ...  In this paper, we propose an unsupervised learning framework GeoNet for jointly estimating monocular depth, optical flow and camera motion from video.  ... 
doi:10.1109/cvpr.2018.00212 dblp:conf/cvpr/YinS18 fatcat:osotve7hgvetdgp6d5hvv2u32q

An unsupervised long short-term memory neural network for event detection in cell videos [article]

Ha Tran Hong Phan, Ashnil Kumar, David Feng, Michael Fulham, Jinman Kim
2017 arXiv   pre-print
So that our LSTM network could be trained in an unsupervised manner, we designed it with a branched structure where one branch learns the frequent, regular appearance and movements of objects and the second  ...  learns the stochastic events, which occur rarely and without warning in a cell video sequence.  ...  Our unsupervised model learned the dynamics of video cellular events and had results comparable to those from supervised methods.  ... 
arXiv:1709.02081v1 fatcat:7vscp3z4jjge3ka7yjeteseffm
« Previous Showing results 1 — 15 out of 18,777 results