Filters








50,120 Hits in 2.8 sec

VCIP 2020 Index

2020 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)  
Weak supervised Learning Guo, Yu Deep Inter Coding with Interpolated Reference Frame for Hierarchical Coding Structure Guo, Yujia Drone-Based Car Counting via Density Map Learning Guo, Zongming  ...  Prediction Across Spati Scale using Deep Learning Li, Lin Quality of Experience Evaluation for Streaming Video Using CGNN Li, Lin Deep Blind Video Quality Assessment for User Generated Videos  ... 
doi:10.1109/vcip49819.2020.9301896 fatcat:bdh7cuvstzgrbaztnahjdp5s5y

Video Salient Object Detection via Fully Convolutional Networks

Wenguan Wang, Jianbing Shen, Ling Shao
2018 IEEE Transactions on Image Processing  
This paper proposes a deep learning model to efficiently detect salient regions in videos.  ...  Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal  ...  Thus we directly capture temporal saliency via learning deep networks from frame pairs, instead of using long-term video information, such as optical flows from multiple adjacent video frames.  ... 
doi:10.1109/tip.2017.2754941 pmid:28945593 fatcat:v644yvm4qjag5l5ri5lq7ztwee

Deep Keyframe Detection in Human Action Videos [article]

Xiang Yan, Syed Zulqarnain Gilani, Hanlin Qin, Mingtao Feng, Liang Zhang, Ajmal Mian
2018 arXiv   pre-print
To this end, we introduce a deep two-stream ConvNet for key frame detection in videos that learns to directly predict the location of key frames.  ...  While the training data is generated taking many different human action videos into account, the trained CNN can predict the importance of frames from a single video.  ...  This work is the first to report results for key frame detection in human ac-tion videos via deep learning. Figure 1 . 1 Key frames detected by our deep keyframe detection network. .  ... 
arXiv:1804.10021v1 fatcat:jciq5xtdfbdbhd3q44ixqpzt2y

D6.1 – Real-time surgeon action detection and recognition

Fabio Cuzzolin
2019 Zenodo  
Its output is, for each video frame, a number of bounding boxes showing where the various actions of interest are taking place, with attached scores (produced by a neural network) for each action class  ...  The input to this component of SARAS is the video streaming in from the available laparoscopic camera.  ...  At the bottom, the predicted action labels for the same video at different time points are overlaid on the corresponding video frames.  ... 
doi:10.5281/zenodo.5749947 fatcat:swltflsrbff4zbxkrloseu4xnu

Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics [article]

Ling-Yu Duan, Jiaying Liu, Wenhan Yang, Tiejun Huang, Wen Gao
2020 arXiv   pre-print
The recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, i.e.  ...  Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale.  ...  Deep Learning Based Video Coding The deep learning techniques significantly promote the development of video coding.  ... 
arXiv:2001.03569v2 fatcat:22dsiwby6nfrbicoxyk2ufwhjy

Hierarchy-Dependent Cross-Platform Multi-View Feature Learning for Venue Category Prediction [article]

Shuqiang Jiang and Weiqing Min and Shuhuan Mei
2018 arXiv   pre-print
CPTDL is capable of reinforcing the learned deep network from videos using images from other platforms. Specifically, CPTDL first trained a deep network using videos.  ...  Taking these aspects into account, we propose a Hierarchy-dependent Cross-platform Multi-view Feature Learning (HCM-FL) framework for venue category prediction from videos by leveraging images from other  ...  We then obtain each kind of video features via mean-pooling on features from key frames.  ... 
arXiv:1810.09833v1 fatcat:7jd5ne3nsfdrrgao3hyptkxkcy

Deep Predictive Video Compression using Mode-Selective Uni-and Bi-directional Predictions based on Multi-frame Hypothesis

Woonsung Park, Munchurl Kim
2020 IEEE Access  
Recent deep learning-based video compression methods were proposed in a limited compression environment using only P-frame or B-frame.  ...  In this paper, we propose an end-to-end deep predictive video compression network, called DeepPVCnet, using mode-selective uni-and bi-directional predictions based on multi-frame hypothesis with a multi-scale  ...  Deep Learning-Based Optical Flow Estimation and Frame Interpolation: Optical flow estimation and frame interpolation can be used for predictive video coding.  ... 
doi:10.1109/access.2020.3046040 fatcat:bvfzpl26arfr5ffhh26qlbc5hm

MaskRNN: Instance Level Video Object Segmentation [article]

Yuan-Ting Hu and Jia-Bin Huang and Alexander G. Schwing
2018 arXiv   pre-print
To capture the temporal coherence, in this paper, we develop MaskRNN, a recurrent neural net approach which fuses in each frame the output of two deep nets for each object instance -- a binary segmentation  ...  Instance level video object segmentation is an important technique for video editing and compression.  ...  Video object segmentation via deep learning: With the success of deep nets on semantic segmentation [32, 42] , deep learning based approaches for video object segmentation [7, 26, 23, 6, 25] have been  ... 
arXiv:1803.11187v1 fatcat:u4epyabpznas5pysgeypg7n4oe

2019 Index IEEE Transactions on Circuits and Systems for Video Technology Vol. 29

2019 IEEE transactions on circuits and systems for video technology (Print)  
., +, TCSVT Dec. 2019 3568-3582 Fast Single-Image Super-Resolution via Deep Network With Component Learning.  ...  Fast Single-Image Super-Resolution via Deep Network With Component Learning. Xie, C., +, TCSVT Dec. 2019 3473-3486 Hadamard Transform-Based Optimized HEVC Video Coding.  ... 
doi:10.1109/tcsvt.2019.2959179 fatcat:2bdmsygnonfjnmnvmb72c63tja

Uniform Learning in a Deep Neural Network via "Oddball" Stochastic Gradient Descent [article]

Andrew J.R. Simpson
2015 arXiv   pre-print
When training deep neural networks, it is typically assumed that the training examples are uniformly difficult to learn.  ...  In this article, using a deep neural network to encode a video, we show that oddball SGD can be used to enforce uniform error across the training set.  ...  Fig. 1 . 1 Example video frame. Fig. 2 . 2 Uniform Learning via oddball SGD. Summary statistics for synthesis error across the training set (1000 video frames).  ... 
arXiv:1510.02442v1 fatcat:6szvh2jpirbldpnsxew4ybcuda

An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond Feature and Signal [article]

Sifeng Xia, Kunchangtai Liang, Wenhan Yang, Ling-Yu Duan, Jiaying Liu
2020 arXiv   pre-print
Specifically, we employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.  ...  By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames via a generative model,  ...  The sparse motion pattern is first extracted automatically via a deep predictive model.  ... 
arXiv:2001.03004v1 fatcat:5wkilmqmsvhhvet2xvoto7fwui

Attention-Aware Sampling via Deep Reinforcement Learning for Action Recognition

Wenkai Dong, Zhaoxiang Zhang, Tieniu Tan
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We formulate the process of mining key frames from videos as a Markov decision process and train the attention agent through deep reinforcement learning without extra labels.  ...  Existing works mainly focus on designing novel deep architectures to achieve video representations learning for action recognition.  ...  learning deep representations from a large number of labeled video data.  ... 
doi:10.1609/aaai.v33i01.33018247 fatcat:c6e6jo25vzhgnko4hrz4myfkra

Deep motion estimation for parallel inter-frame prediction in video compression [article]

André Nortje, Herman A. Engelbrecht, Herman Kamper
2019 arXiv   pre-print
Standard video codecs rely on optical flow to guide inter-frame prediction: pixels from reference frames are moved via motion vectors to predict target video frames.  ...  By replacing the optical flow-based block-motion algorithms found in an existing video codec with our learned inter-frame prediction model, our approach outperforms the standard H.264 and H.265 video codecs  ...  We present a deep learning approach to video frame prediction that can be optimised end-to-end as part of a larger video compression system.  ... 
arXiv:1912.05193v1 fatcat:magzqy5w4fgtrmux2vr7lvtzwe

No-reference video quality assessment via pretrained CNN and LSTM networks

Domonkos Varga, Tamás Szirányi
2019 Signal, Image and Video Processing  
Considering video sequences as a time series of deep features extracted with the help of a CNN, an LSTM network is trained to predict subjective quality scores.  ...  Furthermore, these results are also confirmed using tests on the LIVE Video Quality Assessment Database, which consists of artificially distorted videos.  ...  In our proposed NR-VQA framework, we model a digital video sequence as a sequence of data of frame-level deep features extracted via pretrained CNNs.  ... 
doi:10.1007/s11760-019-01510-8 fatcat:eskintimpvedxj6zjdldierfui

A Plug-and-play Scheme to Adapt Image Saliency Deep Model for Video Data [article]

Yunxiao Li, Shuai Li, Chenglizhao Chen, Aimin Hao, Hong Qin
2020 arXiv   pre-print
With the rapid development of deep learning techniques, image saliency deep models trained solely by spatial information have occasionally achieved detection performance for video data comparable to that  ...  Thus, the most recent video saliency detection approaches have adopted the network architecture starting with a spatial deep model that is followed by an elaborately designed temporal deep model.  ...  frames with high-quality lowlevel saliency predictions.  ... 
arXiv:2008.09103v1 fatcat:mz7wcpw26ja6tispl3c6mmgnqm
« Previous Showing results 1 — 15 out of 50,120 results