A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2014; you can also visit the original URL.
The file type is application/pdf
.
Filters
Efficient temporal consistency for streaming video scene analysis
2013
2013 IEEE International Conference on Robotics and Automation
We address the problem of image-based scene analysis from streaming video, as would be seen from a moving platform, in order to efficiently generate spatially and temporally consistent predictions of semantic ...
Our technique is a meta-algorithm that can be efficiently wrapped around any scene analysis technique that produces a per-pixel semantic label distribution. ...
Wendel for helping with the optical flow computation, C. Fabaret for providing the NYUScenes dataset and his classifications, C. ...
doi:10.1109/icra.2013.6630567
dblp:conf/icra/MiksikMBH13
fatcat:vj35z26fn5dbhkuzjc6u5lswni
Efficient Temporal Consistency for Streaming Video Scene Analysis
2018
We address the problem of image-based scene analysis from streaming video, as would be seen from a moving platform, in order to efficiently generate spatially and temporally consistent predictions of semantic ...
Our technique is a meta-algorithm that can be efficiently wrapped around any scene analysis technique that produces a per-pixel semantic label distribution. ...
Wendel for helping with the optical flow computation, C. Fabaret for providing the NYUScenes dataset and his classifications, C. ...
doi:10.1184/r1/6554702
fatcat:kum37btpuzb3lldoljw6tp4hli
Two-Stream Transformer Architecture for Long Video Understanding
[article]
2022
arXiv
pre-print
This paper introduces an efficient Spatio-Temporal Attention Network (STAN) which uses a two-stream transformer architecture to model dependencies between static image features and temporal contextual ...
Pure vision transformer architectures are highly effective for short video classification and action recognition tasks. ...
This ensures that positional information is consistent between the spatial and temporal streams during fusion. ...
arXiv:2208.01753v1
fatcat:3pteyddiazanxoi4ffh37teliu
3D-TV – THE FUTURE OF VISUAL ENTERTAINMENT
2005
Multimedia Databases and Image Communication
But the time of passive TV consumption may be over soon: Advances in video acquisition technology, novel image analysis algorithms, and the pace of progress in computer graphics hardware together drive ...
The scientific and technological obstacles towards realizing 3D-TV, the experience of interactively watching real-world dynamic scenes from arbitrary perspective, are currently being put out of the way ...
For 3D-TV, however, multiple synchronized video streams depicting the same scene from different viewpoints must be encoded, calling for new coding algorithms to compress multi-video content. ...
doi:10.1142/9789812702135_0011
fatcat:pkbapsxzd5fl3jwxqq3nnsnbuy
Surveillance System with Object-Aware Video Transcoder
2005
2005 IEEE 7th Workshop on Multimedia Signal Processing
This paper presents an object-aware video surveillance system that is not only smart and friendly for users, but allows for transmission of the scene over limited bandwidth networks. ...
Abstract-This paper presents an object-aware video surveillance system that is not only smart and friendly for users, but allows for transmission of the scene over limited bandwidth networks. ...
Human behavior can be understood immediately and intuitively in a mosaic image, and it is very effective for behavior analysis and scene browsing as well as efficient to transmit the surveillance video ...
doi:10.1109/mmsp.2005.248636
dblp:conf/mmsp/HataKNSV05
fatcat:pvltxi5fwvf37gu3jvmy35pb64
Guest Editorial: Video Recognition
2016
International Journal of Computer Vision
-"A robust and efficient video representation for action recognition" (doi:10.1007/s11263-015-0846-5) by Wang et al. improves dense trajectory video features by explicit B Ivan Laptev ...
Video Recognition" is a fundamental research area in visual recognition, required for true perceptual understanding in any practical scenario where image streams are processed. ...
-"EXMOVES: Mid-level Features for Efficient Action
Recognition and Video Analysis" (doi:10.1007/s11263-
016-0905-6) by Tran and Torresani proposes a scal-
able mid-level representation for video analysis ...
doi:10.1007/s11263-016-0922-5
fatcat:sksuszyyjrdd5arhy23sfdbkty
Key-point Sequence Lossless Compression for Intelligent Video Analysis
2020
IEEE Multimedia
Feature coding has been recently considered to facilitate intelligent video analysis for urban computing. ...
In this article, we present a lossless key-point sequence compression approach for efficient feature coding. ...
These feature streams, when passed to the back-end, enable various video analysis tasks to be achieved efficiently. ...
doi:10.1109/mmul.2020.2990863
fatcat:aeiqha7clfd7pepywiejylaxyu
Depth2Action: Exploring Embedded Depth for Large-Scale Action Recognition
[article]
2016
arXiv
pre-print
We introduce spatio-temporal depth normalization (STDN) to enforce temporal consistency in our estimated depth sequences. ...
This paper performs the first investigation into depth for large-scale human action recognition in video where the depth cues are estimated from the videos themselves. ...
This work was funded in part by a National Science Foundation CAREER grant, #IIS-1150115, and a seed grant from the Center for Information Technology in the Interest of Society (CITRIS). ...
arXiv:1608.04339v1
fatcat:eaby7yy5uzfw7huk3ftrk2k7fm
Drive Video Analysis for the Detection of Traffic Near-Miss Incidents
[article]
2018
arXiv
pre-print
of video clip of dangerous events captured by monocular driving recorders. ...
two main contributions: (i) In order to assist automated systems in detecting near-miss incidents based on database instances, we created a large-scale traffic near-miss incident database (NIDB) that consists ...
Temporal near-miss incident detection consists of determining to which of the seven abovementioned classes (including background) a scene belongs. ...
arXiv:1804.02555v1
fatcat:shz4ekin7bb5hedoq5h73ahjh4
Browsing Sport Content through an Interactive H.264 Streaming Session
2010
2010 Second International Conferences on Advances in Multimedia
This paper builds on an interactive streaming architecture that supports both user feedback interpretation, and temporal juxtaposition of multiple video bitstreams in a single streaming session. ...
Versioning depends on the view type of the initial shot, and typically corresponds to the generation of zoomed in and spatially or temporally subsampled video streams. ...
ACKNOWLEDGMENT The authors would like to thank Walloon Region project Walcomo and Belgian NSF for funding part of this work ...
doi:10.1109/mmedia.2010.28
fatcat:cb2o67hkbze7dfanaperedovja
A Low Complexity Motion Segmentation Based on Semantic Representation of Encoded Video Streams
[chapter]
2011
Lecture Notes in Computer Science
This can be done by a video stream representation based on a semantic abstraction of the video syntax. ...
Video streaming is characterized by a deep heterogeneity due to the availability of many different video standards such as H.262, H.263, MPEG-4/H.264, H.261 and others. ...
Moving objects or regions extraction is a necessary preprocessing for many application such as scene interpretation, analysis and other. ...
doi:10.1007/978-3-642-24088-1_22
fatcat:smnmxampqbey3ef4nt7l2olryq
Smart Broadcast Technique for Improved Video Applications over Constrained Networks
2013
International Journal of Advanced Computer Science and Applications
improved wireless video communication is challenging since video stream is vulnerable to channel distortions. Hence, the need to investigate efficient scheme for improved video communications. ...
The scheme exploits the concept of video analysis and adaptation principles in the optimization process. ...
A typical video scene consists of objects characterized by spatial characteristics (number and shape of objects) and temporal characteristics. ...
doi:10.14569/ijacsa.2013.041002
fatcat:shi74z2fvjbhbdzm6s7gu5vdfy
HMM based structuring of tennis videos using visual and audio cues
2003
2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)
This paper focuses on the use of Hidden Markov Models (HMMs) for structure analysis of videos, and demonstrates how they can be efficiently applied to merge audio and visual cues. ...
The video structure parsing relies on the analysis of the temporal interleaving of video shots, with respect to prior information about tennis content and editing rules. ...
INTRODUCTION Video structure parsing consists in extracting logical story units from the considered video. It's a mandatory step to efficiently organize and retrieve video contents. ...
doi:10.1109/icme.2003.1221310
dblp:conf/icmcs/KijakGGOB03
fatcat:qojicy7c5rfulf4jlnmls2dske
The plenoptic video
2005
IEEE transactions on circuits and systems for video technology (Print)
A new compression algorithm using both temporal and spatial predictions is also proposed for the efficient compression of the plenoptic videos. ...
Using selective transmission, we are able to stream continuously plenoptic video with (256 256) resolution at a rate of 15 f/s over the network. ...
Koo for building the camera system. The assistance of Dr. X. Tong and Dr. X. Liu from Microsoft Research Asia in rendering the plenoptic video is highly appreciated. The help of Ms. P. K. ...
doi:10.1109/tcsvt.2005.858616
fatcat:sxuttvoxq5djronzv5o2wr2c7q
A Comprehensive Study of Deep Video Action Recognition
[article]
2020
arXiv
pre-print
Video action recognition is one of the representative tasks for video understanding. ...
kernels, and finally to the recent compute-efficient models. ...
Acknowledgement We would like to thank Peter Gehler, Linchao Zhu and Thomas Brady for constructive feedback and fruitful discussions. ...
arXiv:2012.06567v1
fatcat:plqytbfck5bcndiceshix5unpa
« Previous
Showing results 1 — 15 out of 19,879 results