Filters








19,879 Hits in 3.1 sec

Efficient temporal consistency for streaming video scene analysis

Ondrej Miksik, Daniel Munoz, J. Andrew Bagnell, Martial Hebert
2013 2013 IEEE International Conference on Robotics and Automation  
We address the problem of image-based scene analysis from streaming video, as would be seen from a moving platform, in order to efficiently generate spatially and temporally consistent predictions of semantic  ...  Our technique is a meta-algorithm that can be efficiently wrapped around any scene analysis technique that produces a per-pixel semantic label distribution.  ...  Wendel for helping with the optical flow computation, C. Fabaret for providing the NYUScenes dataset and his classifications, C.  ... 
doi:10.1109/icra.2013.6630567 dblp:conf/icra/MiksikMBH13 fatcat:vj35z26fn5dbhkuzjc6u5lswni

Efficient Temporal Consistency for Streaming Video Scene Analysis

Ondrej Miksik, Daniel Munoz, J. Andrew Bagnell, Martial Hebert
2018
We address the problem of image-based scene analysis from streaming video, as would be seen from a moving platform, in order to efficiently generate spatially and temporally consistent predictions of semantic  ...  Our technique is a meta-algorithm that can be efficiently wrapped around any scene analysis technique that produces a per-pixel semantic label distribution.  ...  Wendel for helping with the optical flow computation, C. Fabaret for providing the NYUScenes dataset and his classifications, C.  ... 
doi:10.1184/r1/6554702 fatcat:kum37btpuzb3lldoljw6tp4hli

Two-Stream Transformer Architecture for Long Video Understanding [article]

Edward Fish, Jon Weinbren, Andrew Gilbert
2022 arXiv   pre-print
This paper introduces an efficient Spatio-Temporal Attention Network (STAN) which uses a two-stream transformer architecture to model dependencies between static image features and temporal contextual  ...  Pure vision transformer architectures are highly effective for short video classification and action recognition tasks.  ...  This ensures that positional information is consistent between the spatial and temporal streams during fusion.  ... 
arXiv:2208.01753v1 fatcat:3pteyddiazanxoi4ffh37teliu

3D-TV – THE FUTURE OF VISUAL ENTERTAINMENT

M. MAGNOR
2005 Multimedia Databases and Image Communication  
But the time of passive TV consumption may be over soon: Advances in video acquisition technology, novel image analysis algorithms, and the pace of progress in computer graphics hardware together drive  ...  The scientific and technological obstacles towards realizing 3D-TV, the experience of interactively watching real-world dynamic scenes from arbitrary perspective, are currently being put out of the way  ...  For 3D-TV, however, multiple synchronized video streams depicting the same scene from different viewpoints must be encoded, calling for new coding algorithms to compress multi-video content.  ... 
doi:10.1142/9789812702135_0011 fatcat:pkbapsxzd5fl3jwxqq3nnsnbuy

Surveillance System with Object-Aware Video Transcoder

Toshihiko Hata, Naoki Kuwahara, Toshiharu Nozawa, Derek Schwenke, Anthony Vetro
2005 2005 IEEE 7th Workshop on Multimedia Signal Processing  
This paper presents an object-aware video surveillance system that is not only smart and friendly for users, but allows for transmission of the scene over limited bandwidth networks.  ...  Abstract-This paper presents an object-aware video surveillance system that is not only smart and friendly for users, but allows for transmission of the scene over limited bandwidth networks.  ...  Human behavior can be understood immediately and intuitively in a mosaic image, and it is very effective for behavior analysis and scene browsing as well as efficient to transmit the surveillance video  ... 
doi:10.1109/mmsp.2005.248636 dblp:conf/mmsp/HataKNSV05 fatcat:pvltxi5fwvf37gu3jvmy35pb64

Guest Editorial: Video Recognition

Ivan Laptev, Deva Ramanan, Josef Sivic
2016 International Journal of Computer Vision  
-"A robust and efficient video representation for action recognition" (doi:10.1007/s11263-015-0846-5) by Wang et al. improves dense trajectory video features by explicit B Ivan Laptev  ...  Video Recognition" is a fundamental research area in visual recognition, required for true perceptual understanding in any practical scenario where image streams are processed.  ...  -"EXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis" (doi:10.1007/s11263- 016-0905-6) by Tran and Torresani proposes a scal- able mid-level representation for video analysis  ... 
doi:10.1007/s11263-016-0922-5 fatcat:sksuszyyjrdd5arhy23sfdbkty

Key-point Sequence Lossless Compression for Intelligent Video Analysis

Weiyao Lin, Xiaoyi He, Wenrui Dai, John See, Tushar Shinde, Hongkai Xiong, Ling-Yu Duan
2020 IEEE Multimedia  
Feature coding has been recently considered to facilitate intelligent video analysis for urban computing.  ...  In this article, we present a lossless key-point sequence compression approach for efficient feature coding.  ...  These feature streams, when passed to the back-end, enable various video analysis tasks to be achieved efficiently.  ... 
doi:10.1109/mmul.2020.2990863 fatcat:aeiqha7clfd7pepywiejylaxyu

Depth2Action: Exploring Embedded Depth for Large-Scale Action Recognition [article]

Yi Zhu, Shawn Newsam
2016 arXiv   pre-print
We introduce spatio-temporal depth normalization (STDN) to enforce temporal consistency in our estimated depth sequences.  ...  This paper performs the first investigation into depth for large-scale human action recognition in video where the depth cues are estimated from the videos themselves.  ...  This work was funded in part by a National Science Foundation CAREER grant, #IIS-1150115, and a seed grant from the Center for Information Technology in the Interest of Society (CITRIS).  ... 
arXiv:1608.04339v1 fatcat:eaby7yy5uzfw7huk3ftrk2k7fm

Drive Video Analysis for the Detection of Traffic Near-Miss Incidents [article]

Hirokatsu Kataoka, Teppei Suzuki, Shoko Oikawa, Yasuhiro Matsui, Yutaka Satoh
2018 arXiv   pre-print
of video clip of dangerous events captured by monocular driving recorders.  ...  two main contributions: (i) In order to assist automated systems in detecting near-miss incidents based on database instances, we created a large-scale traffic near-miss incident database (NIDB) that consists  ...  Temporal near-miss incident detection consists of determining to which of the seven abovementioned classes (including background) a scene belongs.  ... 
arXiv:1804.02555v1 fatcat:shz4ekin7bb5hedoq5h73ahjh4

Browsing Sport Content through an Interactive H.264 Streaming Session

Iván Alén Fernández, Fan Chen, Fabien Lavigne, Xavier Desurmont, Christophe De Vleeschouwer
2010 2010 Second International Conferences on Advances in Multimedia  
This paper builds on an interactive streaming architecture that supports both user feedback interpretation, and temporal juxtaposition of multiple video bitstreams in a single streaming session.  ...  Versioning depends on the view type of the initial shot, and typically corresponds to the generation of zoomed in and spatially or temporally subsampled video streams.  ...  ACKNOWLEDGMENT The authors would like to thank Walloon Region project Walcomo and Belgian NSF for funding part of this work  ... 
doi:10.1109/mmedia.2010.28 fatcat:cb2o67hkbze7dfanaperedovja

A Low Complexity Motion Segmentation Based on Semantic Representation of Encoded Video Streams [chapter]

Maurizio Abbate, Ciro D'Elia, Paola Mariano
2011 Lecture Notes in Computer Science  
This can be done by a video stream representation based on a semantic abstraction of the video syntax.  ...  Video streaming is characterized by a deep heterogeneity due to the availability of many different video standards such as H.262, H.263, MPEG-4/H.264, H.261 and others.  ...  Moving objects or regions extraction is a necessary preprocessing for many application such as scene interpretation, analysis and other.  ... 
doi:10.1007/978-3-642-24088-1_22 fatcat:smnmxampqbey3ef4nt7l2olryq

Smart Broadcast Technique for Improved Video Applications over Constrained Networks

U. Ukommi
2013 International Journal of Advanced Computer Science and Applications  
improved wireless video communication is challenging since video stream is vulnerable to channel distortions. Hence, the need to investigate efficient scheme for improved video communications.  ...  The scheme exploits the concept of video analysis and adaptation principles in the optimization process.  ...  A typical video scene consists of objects characterized by spatial characteristics (number and shape of objects) and temporal characteristics.  ... 
doi:10.14569/ijacsa.2013.041002 fatcat:shi74z2fvjbhbdzm6s7gu5vdfy

HMM based structuring of tennis videos using visual and audio cues

E. Kijak, G. Gravier, P. Gros, L. Oisel, F. Bimbot
2003 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)  
This paper focuses on the use of Hidden Markov Models (HMMs) for structure analysis of videos, and demonstrates how they can be efficiently applied to merge audio and visual cues.  ...  The video structure parsing relies on the analysis of the temporal interleaving of video shots, with respect to prior information about tennis content and editing rules.  ...  INTRODUCTION Video structure parsing consists in extracting logical story units from the considered video. It's a mandatory step to efficiently organize and retrieve video contents.  ... 
doi:10.1109/icme.2003.1221310 dblp:conf/icmcs/KijakGGOB03 fatcat:qojicy7c5rfulf4jlnmls2dske

The plenoptic video

Shing-Chow Chan, King-To Ng, Zhi-Feng Gan, Kin-Lok Chan, Heung-Yeung Shum
2005 IEEE transactions on circuits and systems for video technology (Print)  
A new compression algorithm using both temporal and spatial predictions is also proposed for the efficient compression of the plenoptic videos.  ...  Using selective transmission, we are able to stream continuously plenoptic video with (256 256) resolution at a rate of 15 f/s over the network.  ...  Koo for building the camera system. The assistance of Dr. X. Tong and Dr. X. Liu from Microsoft Research Asia in rendering the plenoptic video is highly appreciated. The help of Ms. P. K.  ... 
doi:10.1109/tcsvt.2005.858616 fatcat:sxuttvoxq5djronzv5o2wr2c7q

A Comprehensive Study of Deep Video Action Recognition [article]

Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo Wu, Zhi Zhang, Joseph Tighe, R. Manmatha, Mu Li
2020 arXiv   pre-print
Video action recognition is one of the representative tasks for video understanding.  ...  kernels, and finally to the recent compute-efficient models.  ...  Acknowledgement We would like to thank Peter Gehler, Linchao Zhu and Thomas Brady for constructive feedback and fruitful discussions.  ... 
arXiv:2012.06567v1 fatcat:plqytbfck5bcndiceshix5unpa
« Previous Showing results 1 — 15 out of 19,879 results