A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Quality-Gated Convolutional LSTM for Enhancing Compressed Video
[article]
2019
arXiv
pre-print
The past decade has witnessed great success in applying deep learning to enhance the quality of compressed video. ...
More importantly, due to the obvious quality fluctuation among compressed frames, higher quality frames can provide more useful information for other frames to enhance quality. ...
In this paper, the Quality-Gated Convolutional Long Short-Term Memory (QG-ConvLSTM) network is proposed for enhancing the quality of compressed video. ...
arXiv:1903.04596v3
fatcat:w5wilqhlsraexay4qsph3r6ezi
Fast Object Detection in Compressed Video
[article]
2019
arXiv
pre-print
To our best knowledge, the MMNet is the first work that investigates a deep convolutional detector on compressed videos. ...
The MMNet has two major advantages: 1) It significantly accelerates the procedure of feature extraction for compressed videos. ...
In this paper, we propose a fast and accurate object detection method for compressed videos. ...
arXiv:1811.11057v3
fatcat:nap2cuapizbhhn72noszqkg3ta
Impact of Video Compression and Multimodal Embedding on Scene Description
2019
Electronics
Hence, this paper analyzes the impact of video compression on scene description, and then proposes a simple network that is robust to compression artifacts. ...
In addition, a network cascading more encoding layers for efficient multimodal embedding is also proposed. ...
framework to enhance accurate sentence generation. ...
doi:10.3390/electronics8090963
fatcat:4wvqccujqnerjd6r3bzlgk4wk4
Machine Learning based Post Processing Artifact Reduction in HEVC Intra Coding
[article]
2019
arXiv
pre-print
To reduce those compression artifacts various Convolutional Neural Network (CNN) based post processing techniques have been experimented over recent years. ...
We designed a variable filter size Sub-layered Deeper CNN (SDCNN) architecture to improve filtering operation and incorporated large stride convolutional, deconvolution layers for further speed up. ...
We would also like to explore the Gated Recurrent Unit (GRU) LSTM network [20] and bidirectional LSTM [23] , for the inter frame (P and B) predictions and investigate further. ...
arXiv:1912.13100v1
fatcat:rljhvmpfxze5bcreb54maokjsi
FPGA Implementation of Deep Leaning Model for Video Analytics
2022
Computers Materials & Continua
Convolutional neural networks are one of the most popular deep learning architecture especially for image classification and video analytics. ...
This paper introduces a modern high-performance, energy-efficient Bat Pruned Ensembled Convolutional networks (BPEC-CNN) for processing the video in the hardware. ...
The enhancement model deployed can exhibit poor visual quality if the bin number decreases. Chen et al. [9] developed a pipeline model for depth estimation in stereo applications. ...
doi:10.32604/cmc.2022.019921
fatcat:awgdl2645nbmnmlc7ysxqy7vrm
Spatio-temporal video autoencoder with differentiable memory
[article]
2016
arXiv
pre-print
The temporal encoder is represented by a differentiable visual memory composed of convolutional long short-term memory (LSTM) cells that integrate changes over time. ...
At each time step, the system receives as input a video frame, predicts the optical flow based on the current observation and the LSTM memory state as a dense transformation map, and applies it to the ...
ACKNOWLEDGMENTS We are greatly indebted to the Torch community for their efforts to maintain this great library, and especially to Nicholas Léonard for his helpful advice. ...
arXiv:1511.06309v5
fatcat:kkeh6puykja7fiptyf5pbtwfmi
Speaker-conditioned Target Speaker Extraction based on Customized LSTM Cells
[article]
2021
arXiv
pre-print
processing in the forget gate. ...
Experimental results for two-speaker mixtures using the Librispeech dataset show that this customization significantly improves the target speaker extraction performance compared to using standard LSTM ...
SDR is a common measure to evaluate the performance of source separation systems, while PESQ is a speech quality metric commonly used for speech enhancement systems. ...
arXiv:2104.04234v1
fatcat:7h7tllzp65cbpeio3f2b5ls56q
A Review of Video Object Detection: Datasets, Metrics and Methods
2020
Applied Sciences
An overview of the existing datasets for video object detection together with commonly used evaluation metrics is first presented. ...
The aim of this paper is to provide a review of these papers on video object detection. ...
Forget Gate, Input Gate and Output Gate operate the 3 × 3 convolutions followed by the activation function.
Figure 6 . 6 Temporal Conv LSTM architecture. ...
doi:10.3390/app10217834
fatcat:qyp2b5guovftplzmmmnec33bdm
2020 Index IEEE Transactions on Circuits and Systems for Video Technology Vol. 30
2020
IEEE transactions on circuits and systems for video technology (Print)
., +, TCSVT Nov. 2020 4020-4033
Enhancement of Noisy and Compressed Videos by Optical Flow and
Non-Local Denoising. ...
., +, TCSVT Jan. 2020 11-22 Field programmable gate arrays A Reconfigurable Architecture for Discrete Cosine Transform in Video Coding. ...
A Memory-Efficient Hardware Architecture for Connected Component Labeling in Embedded System. ...
doi:10.1109/tcsvt.2020.3043861
fatcat:s6z4wzp45vfflphgfcxh6x7npu
Convolutional Neural Networks for Continuous QoE Prediction in Video Streaming Services
2020
IEEE Access
Meanwhile, Temporal Convolutional Network (TCN), a variation of convolutional neural networks, has recently been proposed for sequence modeling tasks (e.g., speech enhancement), providing a superior prediction ...
INDEX TERMS Convolutional neural networks, temporal convolutional network, quality of experience, video streaming. 116268 This work is licensed under a Creative Commons Attribution 4.0 License. ...
video quality level. ...
doi:10.1109/access.2020.3004125
fatcat:kwcw7tfcincdrg2qnz6xwftvqi
Approximate LSTMs for Time-Constrained Inference: Enabling Fast Reaction in Self-Driving Cars
[article]
2019
arXiv
pre-print
The need to recognise long-term dependencies in sequential data such as video streams has made Long Short-Term Memory (LSTM) networks a prominent Artificial Intelligence model for many emerging applications ...
Our experiments on a state-of-the-art driving model for autonomous vehicle navigation demonstrate that the proposed approach can yield outputs with similar quality of result compared to a faithful LSTM ...
Input frames for each video are first processed by a Fully-Convolutional Network (FCN) to encode the spatial features which are then fed to a trained LSTM model that predicts the probability distribution ...
arXiv:1905.00689v2
fatcat:wdx5cbijrfcifpf2hzeqmnx4hy
Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
[article]
2019
arXiv
pre-print
In this paper, we present a novel design --- Temporal Deformable Convolutional Encoder-Decoder Networks (dubbed as TDConvED) that fully employ convolutions in both encoder and decoder networks for video ...
Our model also capitalizes on temporal attention mechanism for sentence generation. ...
Moreover, we additionally explore the temporal deformable convolutions and temporal attention mechanism to extend and utilize temporal dynamics across frames/clips, and eventually enhance the quality of ...
arXiv:1905.01077v1
fatcat:npbmvhxsk5eklc5akqvrwbz2ve
Learning to Compress Videos without Computing Motion
[article]
2020
arXiv
pre-print
The LSTM-UNet is used in the FRN to learn space time differential representations of videos. ...
The combined network is able to efficiently capture both temporal and spatial video information, making it highly amenable for our purposes. ...
The C-LSTM is a convolutional version of the original LSTM, which replaces the matrix multiplication operation of the traditional LSTM with convolutions. ...
arXiv:2009.14110v2
fatcat:hm4qdgnz4fehvghdyyabcahf5e
FB-MSTCN: A Full-Band Single-Channel Speech Enhancement Method Based on Multi-Scale Temporal Convolutional Network
[article]
2022
arXiv
pre-print
To solve this problem, this paper proposes a two-stage real-time speech enhancement model with extraction-interpolation mechanism for a full-band signal. ...
After the two-stage enhancement, the enhanced full-band speech signal is restored by interval interpolation. ...
Speech enhancement techniques are essential for removing noise interference to improve speech quality. ...
arXiv:2203.07684v1
fatcat:5myq7lujdbhwvmwiq3co3fpljq
A Cross-Cultural Study of Mathematical Achievement: from the Perspectives of One's Motivation and Problem-solving Style
2019
International Journal of Science and Mathematics Education
In this paper, we present a novel design -Temporal Deformable Convolutional Encoder-Decoder Networks (dubbed as TD-ConvED) that fully employ convolutions in both encoder and decoder networks for video ...
Our model also capitalizes on temporal attention mechanism for sentence generation. ...
Moreover, we additionally explore the temporal deformable convolutions and temporal attention mechanism to extend and utilize temporal dynamics across frames/clips, and eventually enhance the quality of ...
doi:10.1007/s10763-019-10011-6
fatcat:52vzw7ycojcrre7tzcolzegcmm
« Previous
Showing results 1 — 15 out of 1,004 results