1,004 Hits in 3.2 sec

Quality-Gated Convolutional LSTM for Enhancing Compressed Video [article]

Ren Yang, Xiaoyan Sun, Mai Xu, Wenjun Zeng
2019 arXiv   pre-print
The past decade has witnessed great success in applying deep learning to enhance the quality of compressed video.  ...  More importantly, due to the obvious quality fluctuation among compressed frames, higher quality frames can provide more useful information for other frames to enhance quality.  ...  In this paper, the Quality-Gated Convolutional Long Short-Term Memory (QG-ConvLSTM) network is proposed for enhancing the quality of compressed video.  ... 
arXiv:1903.04596v3 fatcat:w5wilqhlsraexay4qsph3r6ezi

Fast Object Detection in Compressed Video [article]

Shiyao Wang, Hongchao Lu, Zhidong Deng
2019 arXiv   pre-print
To our best knowledge, the MMNet is the first work that investigates a deep convolutional detector on compressed videos.  ...  The MMNet has two major advantages: 1) It significantly accelerates the procedure of feature extraction for compressed videos.  ...  In this paper, we propose a fast and accurate object detection method for compressed videos.  ... 
arXiv:1811.11057v3 fatcat:nap2cuapizbhhn72noszqkg3ta

Impact of Video Compression and Multimodal Embedding on Scene Description

Jin Young Lee
2019 Electronics  
Hence, this paper analyzes the impact of video compression on scene description, and then proposes a simple network that is robust to compression artifacts.  ...  In addition, a network cascading more encoding layers for efficient multimodal embedding is also proposed.  ...  framework to enhance accurate sentence generation.  ... 
doi:10.3390/electronics8090963 fatcat:4wvqccujqnerjd6r3bzlgk4wk4

Machine Learning based Post Processing Artifact Reduction in HEVC Intra Coding [article]

K.R. Rao, Ninad Gorey
2019 arXiv   pre-print
To reduce those compression artifacts various Convolutional Neural Network (CNN) based post processing techniques have been experimented over recent years.  ...  We designed a variable filter size Sub-layered Deeper CNN (SDCNN) architecture to improve filtering operation and incorporated large stride convolutional, deconvolution layers for further speed up.  ...  We would also like to explore the Gated Recurrent Unit (GRU) LSTM network [20] and bidirectional LSTM [23] , for the inter frame (P and B) predictions and investigate further.  ... 
arXiv:1912.13100v1 fatcat:rljhvmpfxze5bcreb54maokjsi

FPGA Implementation of Deep Leaning Model for Video Analytics

Khuram Nawaz Khayam, Zahid Mehmood, Hassan Nazeer Chaudhry, Muhammad Usman Ashraf, Usman Tariq, Mohammed Nawaf Altouri, Khalid Alsubhi
2022 Computers Materials & Continua  
Convolutional neural networks are one of the most popular deep learning architecture especially for image classification and video analytics.  ...  This paper introduces a modern high-performance, energy-efficient Bat Pruned Ensembled Convolutional networks (BPEC-CNN) for processing the video in the hardware.  ...  The enhancement model deployed can exhibit poor visual quality if the bin number decreases. Chen et al. [9] developed a pipeline model for depth estimation in stereo applications.  ... 
doi:10.32604/cmc.2022.019921 fatcat:awgdl2645nbmnmlc7ysxqy7vrm

Spatio-temporal video autoencoder with differentiable memory [article]

Viorica Patraucean, Ankur Handa, Roberto Cipolla
2016 arXiv   pre-print
The temporal encoder is represented by a differentiable visual memory composed of convolutional long short-term memory (LSTM) cells that integrate changes over time.  ...  At each time step, the system receives as input a video frame, predicts the optical flow based on the current observation and the LSTM memory state as a dense transformation map, and applies it to the  ...  ACKNOWLEDGMENTS We are greatly indebted to the Torch community for their efforts to maintain this great library, and especially to Nicholas Léonard for his helpful advice.  ... 
arXiv:1511.06309v5 fatcat:kkeh6puykja7fiptyf5pbtwfmi

Speaker-conditioned Target Speaker Extraction based on Customized LSTM Cells [article]

Ragini Sinha
2021 arXiv   pre-print
processing in the forget gate.  ...  Experimental results for two-speaker mixtures using the Librispeech dataset show that this customization significantly improves the target speaker extraction performance compared to using standard LSTM  ...  SDR is a common measure to evaluate the performance of source separation systems, while PESQ is a speech quality metric commonly used for speech enhancement systems.  ... 
arXiv:2104.04234v1 fatcat:7h7tllzp65cbpeio3f2b5ls56q

A Review of Video Object Detection: Datasets, Metrics and Methods

Haidi Zhu, Haoran Wei, Baoqing Li, Xiaobing Yuan, Nasser Kehtarnavaz
2020 Applied Sciences  
An overview of the existing datasets for video object detection together with commonly used evaluation metrics is first presented.  ...  The aim of this paper is to provide a review of these papers on video object detection.  ...  Forget Gate, Input Gate and Output Gate operate the 3 × 3 convolutions followed by the activation function. Figure 6 . 6 Temporal Conv LSTM architecture.  ... 
doi:10.3390/app10217834 fatcat:qyp2b5guovftplzmmmnec33bdm

2020 Index IEEE Transactions on Circuits and Systems for Video Technology Vol. 30

2020 IEEE transactions on circuits and systems for video technology (Print)  
., +, TCSVT Nov. 2020 4020-4033 Enhancement of Noisy and Compressed Videos by Optical Flow and Non-Local Denoising.  ...  ., +, TCSVT Jan. 2020 11-22 Field programmable gate arrays A Reconfigurable Architecture for Discrete Cosine Transform in Video Coding.  ...  A Memory-Efficient Hardware Architecture for Connected Component Labeling in Embedded System.  ... 
doi:10.1109/tcsvt.2020.3043861 fatcat:s6z4wzp45vfflphgfcxh6x7npu

Convolutional Neural Networks for Continuous QoE Prediction in Video Streaming Services

Tho Nguyen Duc, Chanh Minh Tran, Phan Xuan Tan, Eiji Kamioka
2020 IEEE Access  
Meanwhile, Temporal Convolutional Network (TCN), a variation of convolutional neural networks, has recently been proposed for sequence modeling tasks (e.g., speech enhancement), providing a superior prediction  ...  INDEX TERMS Convolutional neural networks, temporal convolutional network, quality of experience, video streaming. 116268 This work is licensed under a Creative Commons Attribution 4.0 License.  ...  video quality level.  ... 
doi:10.1109/access.2020.3004125 fatcat:kwcw7tfcincdrg2qnz6xwftvqi

Approximate LSTMs for Time-Constrained Inference: Enabling Fast Reaction in Self-Driving Cars [article]

Alexandros Kouris, Stylianos I. Venieris, Michail Rizakis, Christos-Savvas Bouganis
2019 arXiv   pre-print
The need to recognise long-term dependencies in sequential data such as video streams has made Long Short-Term Memory (LSTM) networks a prominent Artificial Intelligence model for many emerging applications  ...  Our experiments on a state-of-the-art driving model for autonomous vehicle navigation demonstrate that the proposed approach can yield outputs with similar quality of result compared to a faithful LSTM  ...  Input frames for each video are first processed by a Fully-Convolutional Network (FCN) to encode the spatial features which are then fed to a trained LSTM model that predicts the probability distribution  ... 
arXiv:1905.00689v2 fatcat:wdx5cbijrfcifpf2hzeqmnx4hy

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning [article]

Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Hongyang Chao, Tao Mei
2019 arXiv   pre-print
In this paper, we present a novel design --- Temporal Deformable Convolutional Encoder-Decoder Networks (dubbed as TDConvED) that fully employ convolutions in both encoder and decoder networks for video  ...  Our model also capitalizes on temporal attention mechanism for sentence generation.  ...  Moreover, we additionally explore the temporal deformable convolutions and temporal attention mechanism to extend and utilize temporal dynamics across frames/clips, and eventually enhance the quality of  ... 
arXiv:1905.01077v1 fatcat:npbmvhxsk5eklc5akqvrwbz2ve

Learning to Compress Videos without Computing Motion [article]

Meixu Chen, Todd Goodall, Anjul Patney, Alan C. Bovik
2020 arXiv   pre-print
The LSTM-UNet is used in the FRN to learn space time differential representations of videos.  ...  The combined network is able to efficiently capture both temporal and spatial video information, making it highly amenable for our purposes.  ...  The C-LSTM is a convolutional version of the original LSTM, which replaces the matrix multiplication operation of the traditional LSTM with convolutions.  ... 
arXiv:2009.14110v2 fatcat:hm4qdgnz4fehvghdyyabcahf5e

FB-MSTCN: A Full-Band Single-Channel Speech Enhancement Method Based on Multi-Scale Temporal Convolutional Network [article]

Zehua Zhang, Lu Zhang, Xuyi Zhuang, Yukun Qian, Heng Li, Mingjiang Wang
2022 arXiv   pre-print
To solve this problem, this paper proposes a two-stage real-time speech enhancement model with extraction-interpolation mechanism for a full-band signal.  ...  After the two-stage enhancement, the enhanced full-band speech signal is restored by interval interpolation.  ...  Speech enhancement techniques are essential for removing noise interference to improve speech quality.  ... 
arXiv:2203.07684v1 fatcat:5myq7lujdbhwvmwiq3co3fpljq

A Cross-Cultural Study of Mathematical Achievement: from the Perspectives of One's Motivation and Problem-solving Style

Szu-Yu Chen, Su-Wei Lin
2019 International Journal of Science and Mathematics Education  
In this paper, we present a novel design -Temporal Deformable Convolutional Encoder-Decoder Networks (dubbed as TD-ConvED) that fully employ convolutions in both encoder and decoder networks for video  ...  Our model also capitalizes on temporal attention mechanism for sentence generation.  ...  Moreover, we additionally explore the temporal deformable convolutions and temporal attention mechanism to extend and utilize temporal dynamics across frames/clips, and eventually enhance the quality of  ... 
doi:10.1007/s10763-019-10011-6 fatcat:52vzw7ycojcrre7tzcolzegcmm
« Previous Showing results 1 — 15 out of 1,004 results