Filters








2,064 Hits in 3.6 sec

Performance evaluation of H.264/AVC decoding and visualization using the GPU

Bart Pieters, Dieter Van Rijsselbergen, Wesley De Neve, Rik Van de Walle, Andrew G. Tescher
2007 Applications of Digital Image Processing XXX  
This decoder performs MC, reconstruction, and CSC on the GPU as well. Our results compare both GPU-enabled decoders, as well as a CPU-only decoder in terms of speed, complexity, and CPU requirements.  ...  Modern computers are typically equipped with powerful yet cost-effective Graphics Processing Units (GPUs) to accelerate graphics operations.  ...  ACKNOWLEDGEMENTS The research as described in this paper was funded by Ghent University, the Interdisciplinary Institute for Broadband Technology (IBBT), the Institute for the Promotion of Innovation by  ... 
doi:10.1117/12.733151 fatcat:hkt6kqirare3nc3hhje7k7fdd4

Parallel H.264/AVC Motion Compensation for GPUs Using OpenCL

Biao Wang, Mauricio Alvarez-Mesa, Chi Ching Chi, Ben Juurlink
2015 IEEE transactions on circuits and systems for video technology (Print)  
Motion compensation is one of the most computeintensive parts in H.264/AVC video decoding. It exposes massive parallelism which can reap the benefit from Graphics Processing Units (GPUs).  ...  However, when the overheads of memory copy and OpenCL runtime are included, no speedup is gained at application level.  ...  This motivates the use of GPUs for accelerating video codecs. The motion compensation stage in H.264/AVC takes a significant proportion of decoding time [4] .  ... 
doi:10.1109/tcsvt.2014.2344512 fatcat:w4ogur3kzbg2nbf23vp2imv4v4

GPU-based Graph Traversal on Compressed Graphs

Mo Sha, Yuchen Li, Kian-Lee Tan
2019 Proceedings of the 2019 International Conference on Management of Data - SIGMOD '19  
Graph processing on GPUs received much attention in the industry and the academia recently, as the hardware accelerator offers attractive potential for performance boost.  ...  However, the high-bandwidth device memory on GPUs has limited capacity that constrains the size of the graph to be loaded on chip.  ...  Massive number of cores and ultra memory bandwidth make GPUs a promising platform for accelerating graph processing.  ... 
doi:10.1145/3299869.3319871 dblp:conf/sigmod/ShaLT19 fatcat:uiqk5lypujbktczpcuuyiopp5q

GPU acceleration of the HEVC decoder inter prediction module

Diego F. de Souza, Aleksandar Ilic, Nuno Roma, Leonel Sousa
2015 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP)  
To circumvent this issue, an efficient acceleration of the HEVC inter prediction decoding module is proposed, by offloading the involved workload to GPU devices.  ...  The inter prediction decoding is one of the most time consuming modules in modern video decoders, which may significantly limit their real-time capabilities.  ...  To provide the fully compliant HEVC real-time encoding/decoding, current research trends aim at accelerating the execution of particular modules by offloading their computations from the Central Processing  ... 
doi:10.1109/globalsip.2015.7418397 dblp:conf/globalsip/SouzaIRS15 fatcat:tktxntk7svfafats2w6hw7u7ru

GPU-assisted decoding of video samples represented in the YCoCg-R color space

Wesley De Neve, Dieter Van Rijsselbergen, Charles Hollemeersch, Jan De Cock, Stijn Notebaert, Rik Van de Walle
2005 Proceedings of the 13th annual ACM international conference on Multimedia - MULTIMEDIA '05  
Our results show that a significant speedup can be achieved by relying on the processing power of the GPU, relative to the CPU.  ...  To be more specific, high definition video (1080p), represented in the YCoCg-R color space, could be decoded to RGB at 30 Hz on a PC with an AMD Athlon XP 2800+ CPU, an AGP bus and an NVIDIA GeForce 6800  ...  The research activities that have been described in this paper were funded by Ghent University, the Interdisciplinary Institute for Broadband Technology (IBBT), the Institute for the Promotion of  ... 
doi:10.1145/1101149.1101248 dblp:conf/mm/NeveRH05 fatcat:2nenvrku5rg6dlg5lhvu7gymke

Accelerate video decoding with generic GPU

Guobin Shen, Guang-Ping Gao, Shipeng Li, Heung-Yeung Shum, Ya-Qin Zhang
2005 IEEE transactions on circuits and systems for video technology (Print)  
In this paper, we present our study on leveraging the GPUs graphics engine to accelerate the video decoding.  ...  By moving the whole motion compensation feedback loop of the decoder to the GPU, the CPU and GPU have been made to work in parallel in a pipelining fashion.  ...  In this section, we first explore the feasibility of GPU acceleration for video decoding and the constraints of GPU.  ... 
doi:10.1109/tcsvt.2005.846440 fatcat:htdvbzkzfjfz3a3ey4b36aj76i

Low-Latency Software Polar Decoders

Pascal Giard, Gabi Sarkis, Camille Leroux, Claude Thibeault, Warren J. Gross
2016 Journal of Signal Processing Systems  
Finally, we show that the energy efficiency of the proposed decoders is comparable to state-of-the-art software polar decoders.  ...  These proposed decoders have an order of magnitude lower latency and memory footprint compared to state-of-the-art decoders, while maintaining comparable throughput.  ...  Claude Thibeault is a member of ReSMiQ. Warren J. Gross is a member of ReSMiQ and SYTACom.  ... 
doi:10.1007/s11265-016-1157-y fatcat:ozsx2cobevbgtjbiio5qunur3u

Accelerating wavelet-based video coding on graphics hardware using CUDA

W.J. van der Laan, J.B.T.M. Roerdink, A.C. Jalba
2009 2009 Proceedings of 6th International Symposium on Image and Signal Processing and Analysis  
We have integrated our DWT into the Dirac Wavelet Video Codec (DWVC), of which the overlapped block motion compensation compensation and frame arithmetic have been accelerated using CUDA as well.  ...  This transform, by means of the lifting scheme, can be performed in a memory and computation efficient way on modern, programmable GPUs, which can be regarded as massively parallel co-processors through  ...  Acknowledgements This research is part of the "VIEW: Visual Interactive Effective Worlds" program, funded by the Dutch National Science Foundation (NWO), project no. 643.100.501.  ... 
doi:10.1109/ispa.2009.5297658 fatcat:fb2fu2g5efcvhdfbsw35nulize

Accelerating JPEG Decompression on GPUs [article]

André Weißenberger, Bertil Schmidt
2021 arXiv   pre-print
For GPU-accelerated computer vision and deep learning tasks, such as the training of image classification models, efficient JPEG decoding is essential due to limitations in memory bandwidth.  ...  Furthermore, it achieves a speedup of up to 3.4 over nvJPEG accelerated with the dedicated hardware JPEG decoder on an A100.  ...  [3] accelerated JPEG decoding on GPUs by parallelizing the IDCT step using CUDA. Sodsong et al.  ... 
arXiv:2111.09219v1 fatcat:xzn5ovus65cajgpermbb6hny4y

An Optimized Parallel IDCT on Graphics Processing Units [chapter]

Biao Wang, Mauricio Alvarez-Mesa, Chi Ching Chi, Ben Juurlink
2013 Lecture Notes in Computer Science  
In this paper we present an implementation of the H.264/AVC Inverse Discrete Cosine Transform (IDCT) optimized for Graphics Processing Units (GPUs) using OpenCL.  ...  By exploiting that most of the input data of the IDCT for real videos are zero valued coefficients a new compacted data representation is created that allows for several optimizations.  ...  Implementation of IDCT on GPU Our GPU implementation is based on an optimized CPU version of the H.264 decoder that, in turn, is based on FFmpeg [9] .  ... 
doi:10.1007/978-3-642-36949-0_18 fatcat:ypddl63i5fgrtbdkmupgfxovoa

A GPU-based Branch-and-Bound algorithm using Integer–Vector–Matrix data structure

J. Gmys, M. Mezmaz, N. Melab, D. Tuyttens
2016 Parallel Computing  
The implementation on GPU is based on the Integer-Vector-Matrix (IVM) data structure which is used instead of a conventional linked-list to store and manage the pool of subproblems.  ...  Compared to a GPU-accelerated B&B based on a linked-list, the algorithm presented in this paper solves a set of standard flowshop instances on average 3.3 times faster.  ...  GPU-accelerated linked-list-based B&B [1] (GPU-LL), described in Subsection 1.3.  ... 
doi:10.1016/j.parco.2016.01.008 fatcat:zicfpfonsffnhdbqpszkjsa22u

Interleaved entropy coders [article]

Fabian Giesen
2014 arXiv   pre-print
This allows for very efficient encoding and decoding on CPUs supporting superscalar execution or SIMD instructions, as well as GPU implementations.  ...  state that the encoder was in when writing those bits---all "buffering" of information is explicitly part of the coder state and identical between encoder and decoder.  ...  Acknowledgments Thanks to my colleagues Charles Bloom and Sean Barrett for reviewing earlier drafts of this paper and making valuable suggestions.  ... 
arXiv:1402.3392v1 fatcat:3b3t2mok4fhg5aohb5g7zrrkx4

Fast software polar decoders

Pascal Giard, Gabi Sarkis, Claude Thibeault, Warren J. Gross
2014 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
We also show that, for a similar error-correction performance, the throughput of polar decoders both surpasses that of LDPC decoders targeting general-purpose processors and is competitive with that of  ...  state-of-the-art software LDPC decoders running on graphic processing units.  ...  the fastest software GPU-based LDPC decoders.  ... 
doi:10.1109/icassp.2014.6855069 dblp:conf/icassp/GiardSTG14 fatcat:bj2rzfrtpre7tkdb7rgaehbnem

Parallelization of Variable Rate Decompression through Metadata

Lennart Noordsij, Steven van der Vlugt, Mohamed A. Bamakhrama, Zaid Al-Ars, Peter Lindstrom
2020 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)  
On a GPU, we achieve average decoding rates of up to 100 GiB/s. Our strategies allow the user to make a trade-off between decoding throughput and metadata size overhead.  ...  On a CPU, we achieve a near optimal decoding speedup and an overhead size which is consistently less than 0.04% of the compressed data size.  ...  This work was partially performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.  ... 
doi:10.1109/pdp50117.2020.00045 dblp:conf/pdp/NoordsijVBAL20 fatcat:kowe2t6aczdufjbsbazwa4fxhu

Parallel nonbinary LDPC decoding on GPU

Guohui Wang, Hao Shen, Bei Yin, Michael Wu, Yang Sun, Joseph R. Cavallaro
2012 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR)  
This paper proposes a massively parallel implementation of a nonbinary LDPC decoding accelerator based on a graphics processing unit (GPU) to achieve both great flexibility and scalability.  ...  We highlight the methodology to partition the decoding task to a heterogeneous platform consisting of the CPU and GPU.  ...  We partition the decoding algorithm into five OpenCL kernels, which are listed in Table II .  ... 
doi:10.1109/acssc.2012.6489229 dblp:conf/acssc/WangSYWSC12 fatcat:chinirbyujddlkkkcswnw2cwyy
« Previous Showing results 1 — 15 out of 2,064 results