58 Hits in 4.9 sec

Parallel nonbinary LDPC decoding on GPU

Guohui Wang, Hao Shen, Bei Yin, Michael Wu, Yang Sun, Joseph R. Cavallaro
2012 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR)  
Nonbinary Low-Density Parity-Check (LDPC) codes are a class of error-correcting codes constructed over the Galois field GF (q) for q > 2.  ...  As extensions of binary LDPC codes, nonbinary LDPC codes can provide better error-correcting performance when the code length is short or moderate, but at a cost of higher decoding complexity.  ...  Due to its inherently massive parallelism, a nonbinary LDPC decoder is more suitable for a GPU implementation than for binary LDPC codes.  ... 
doi:10.1109/acssc.2012.6489229 dblp:conf/acssc/WangSYWSC12 fatcat:chinirbyujddlkkkcswnw2cwyy

LDPC Decoding on GPU for Mobile Device

Yiqin Lu, Weiyue Su, Jiancheng Qin
2016 Mobile Information Systems  
A flexible software LDPC decoder that exploits data parallelism for simultaneous multicode words decoding on the mobile device is proposed in this paper, supported by multithreading on OpenCL based graphics  ...  To realize efficient software LDPC decoding on the mobile device, the LDPC decoding feature on communication baseband chip should be replaced to save the cost and make it easier to upgrade decoder to be  ...  Parallel MSA LDPC Decoding on Mobile GPU MSA is an intensive processing, which should be processed in a high-performance specific computing engine, or in a highly parallel programmable device.  ... 
doi:10.1155/2016/7048482 fatcat:bo4qrh2urngqzpyofjlcnrzvwq

Combining Belief Propagation and Successive Cancellation List Decoding of Polar Codes on a GPU Platform [article]

Sebastian Cammerer, Benedikt Leible, Matthias Stahl, Jakob Hoydis, Stephan ten Brink
2016 arXiv   pre-print
The decoding performance of polar codes strongly depends on the decoding algorithm used, while also the decoder throughput and its latency mainly depend on the decoding algorithm.  ...  The proposed scheme combines excellent decoding performance and high throughput within the signal-to-noise ratio (SNR) region of interest.  ...  A similar BP algorithm can also be used for polar code decoding [9] and a correspondingly high throughput gain was observed in [10] on a GPU.  ... 
arXiv:1609.09358v3 fatcat:ai35px7pnfbxxpm3imkkjd3vge

Design Space Exploration of LDPC Decoders Using High-Level Synthesis

Joao Andrade, Nithin George, Kimon Karras, David Novo, Frederico Pratas, Leonel Sousa, Paolo Ienne, Gabriel Falcao, Vitor Silva
2017 IEEE Access  
Our prototype LDPC decoders developed using HLS tools obtain throughputs ranging from a few Mbits/s up to Gbits/s and latencies as low as 5 ms.  ...  Based on these results, we provide insights that will help users to select the most suitable model for designing LDPC decoder blocks using these HLS tools.  ...  Although CPUs and GPUs are often used to simulate these codes, the bulk of deployed LDPC decoders are in the form of dedicated very large scale integration (VLSI) devices [7] , [12] .  ... 
doi:10.1109/access.2017.2727221 fatcat:xkggvch3wfbezefdmd6nntpugq

Implementation of a High Throughput 3GPP Turbo Decoder on GPU

Michael Wu, Yang Sun, Guohui Wang, Joseph R. Cavallaro
2011 Journal of Signal Processing Systems  
In this paper, we present a 3GPP LTE compliant Turbo decoder accelerator that takes advantage of the processing power of GPU to offer fast Turbo decoding throughput.  ...  To improve efficiency of the decoder in the high SNR regime, we also present a low complexity early termination scheme based on average extrinsic LLR statistics.  ...  There are also a number of GPU based LDPC channel decoders [13] . Despite the popularity of Turbo codes, there are few existing Turbo decoder implementations on GPU [14, 15] .  ... 
doi:10.1007/s11265-011-0617-7 fatcat:plaoxwqkuvhrjbzv2s2nctoawa

Implementation of a 3GPP LTE turbo decoder accelerator on GPU

Michael Wu, Yang Sun, Joseph R. Cavallaro
2010 2010 IEEE Workshop On Signal Processing Systems  
The challenge of implementing a turbo decoder is finding an efficient mapping of the decoder algorithm on GPU, e.g. finding a good way to parallelize workload across cores and allocate and use fast on-die  ...  This paper presents a 3GPP LTE compliant turbo decoder accelerator on GPU.  ...  In fact, a number of processing intensive communication algorithms have been implemented on GPU. GPU implementations of LDPC decoder are capable of real time throughput [9] .  ... 
doi:10.1109/sips.2010.5624788 fatcat:cdtvieyk25fpbkdtcf5lrnjxo4

Accelerating massive MIMO uplink detection on GPU for SDR systems

Kaipeng Li, Bei Yin, Michael Wu, Joseph R. Cavallaro, Christoph Studer
2015 2015 IEEE Dallas Circuits and Systems Conference (DCAS)  
Our GPU implementation exceeds 250 Mb/s detection throughput for a 128×16 antenna system.  ...  We present a reconfigurable GPU-based uplink detector for massive MIMO software-defined radio (SDR) systems.  ...  baseband algorithms, such as LDPC decoding [4] or turbo decoding [5] .  ... 
doi:10.1109/dcas.2015.7356600 fatcat:tu6mdagsmzf3xi22fsmktxzcfm

Implementation of a High Throughput Soft MIMO Detector on GPU

Michael Wu, Yang Sun, Siddharth Gupta, Joseph R. Cavallaro
2010 Journal of Signal Processing Systems  
We aim to show that a MIMO detector on Graphic processor unit (GPU), a low-cost parallel programmable co-processor, can achieve high throughput and can serve as an alternative to ASIC/FPGA designs.  ...  Multiple-input multiple-output (MIMO) significantly increases the throughput of a communication system by employing multiple antennas at the transmitter and the receiver.  ...  To combat errors due to channel noise and fading, a channel decoder such as low density parity code (LDPC) is combined with a soft output MIMO detector at the receiver to maximize performance gain.  ... 
doi:10.1007/s11265-010-0523-4 fatcat:2np5q6eehjaulb7zoal4n7x34y

Implementation of a Fully-Parallel Turbo Decoder on a General-Purpose Graphics Processing Unit

An Li, Robert G. Maunder, Bashir M. Al-Hashimi, Lajos Hanzo
2016 IEEE Access  
However, this necessitates high processing throughputs in order for the turbo code to support real-time communications.  ...  As a benefit of its higher degree of parallelism, we show that our FPTD improves the higher processing throughput of the Log-BCJR turbo decoder by between 2.3 and 9.2 times, when employing a high-specification  ...  for achieving a high processing throughput.  ... 
doi:10.1109/access.2016.2586309 fatcat:cqeypibnrjaabe54xmpj3b2slq

Software-defined Radios: Architecture, State-of-the-art, and Challenges [article]

Rami Akeela, Behnam Dezfouli
2018 arXiv   pre-print
Progress in the SDR field has led to the escalation of protocol development and a wide spectrum of applications, with more emphasis on programmability, flexibility, portability, and energy efficiency,  ...  We also review existing SDR platforms and present an analytical comparison as a guide to developers. Finally, we recognize a few of the related research topics and summarize potential solutions.  ...  Examples of error correcting codes include Convolutional Codes, Turbo Codes, and Low Density Parity Check (LDPC) [30] .  ... 
arXiv:1804.06564v1 fatcat:ogkut4aibnfarbrvjkihdfiqnu

Parallel SUMIS soft detector for large MIMO systems on multicore and GPU

Carla Ramiro, M. Ángeles Simarro, Alberto Gonzalez, Antonio M. Vidal
2018 Journal of Supercomputing  
In this context, the use of High Performance Computing (HPC) systems, such us multicore CPUs and Grapfhics Processing Units (GPUs) has become attractive for efficient implementation of parallel signal  ...  A MIMO systems with very large number of antennas is a promising candidate technology for next generations of wireless systems.  ...  Fig. 1 1 BER as a function of SNR for the N T = N R = 200 in (1 with the LDPC code of rate 1/2.  ... 
doi:10.1007/s11227-018-2403-9 fatcat:ktfpztckx5h2dkekppadikn27u

KiloCore: A 32-nm 1000-Processor Computational Array

Brent Bohnenstiehl, Aaron Stillmaker, Jon J. Pimentel, Timothy Andreas, Bin Liu, Anh T. Tran, Emmanuel Adeagbo, Bevan M. Baas
2017 IEEE Journal of Solid-State Circuits  
Compared with a variety of Intel i7s and Nvidia GPUs, the KiloCore at 1.1 V has geometric mean improvements of 4.3× higher throughput per area and 9.4× higher energy efficiency for AES encryption, 4095  ...  -b low-density parity-check decoding, 4096-point complex fast Fourier transform, and 100-B record sorting applications.  ...  The LDPC comparisons on an i7 [22] and GPU [23] implement (9216,4608) and (2304,1152) codes with row and column weights of 6,3 and 24,12, respectively, and perform five decoding iterations.  ... 
doi:10.1109/jssc.2016.2638459 fatcat:jec5eho5ubgydnewvgxmqsnqim

Implementation of a high-throughput low-latency polyphase channelizer on GPUs

Scott C Kim, Shuvra S Bhattacharyya
2014 EURASIP Journal on Advances in Signal Processing  
With graphics processing unit (GPU) technology, we propose a novel GPU-based polyphase channelizer architecture that delivers high throughput.  ...  This makes our approach and implementation particularly attractive for using GPUs as DSP accelerators for communication systems.  ...  GPU back-end receivers, which are responsible for channel decoding (e.g., using Turbo and LDPC decoders), are captured in [8, 9] .  ... 
doi:10.1186/1687-6180-2014-141 fatcat:iky7t6kux5ecbhg3gp7kvnytva

A Survey Of Baseband Architecture For Software Defined Radio

M. A. Fodha, H. Benfradj, A. Ghazel
2016 Zenodo  
This paper is a survey of recent works that proposes a baseband processor architecture for software defined radio. A classification of different approaches is proposed.  ...  The performance of each architecture is also discussed in order to clarify the suitable approaches that meet software-defined radio constraints.  ...  The single core approach is not well adopted for SDR which require a high level of parallelism in order to process many functionalities such as FFT, coding, modulation.  ... 
doi:10.5281/zenodo.1126159 fatcat:mdhjmvmfafdarg7m6uezpuxiei

GPU accelerated computation of fast spectral transforms

Dusan Gajic, Radomir Stankovic
2011 Facta universitatis - series Electronics and Energetics  
We present a reformulation of fast algorithms which takes into account peculiar properties of transforms to make them suitable for the GPU implementation.  ...  Performance of the GPU implementations is compared with the classical C/C++ implementations for the central processing unit (CPU).  ...  Acknowledgment The authors are very grateful to the reviewers for the constructive comments that were useful in improving the presentation in this paper, as well as for pointing to interesting topics for  ... 
doi:10.2298/fuee1103483g fatcat:efavnkfx5zc2jjit2c2jm5tn7e
« Previous Showing results 1 — 15 out of 58 results