VLSI Implementation of Fully Parallel LTE Turbo Decoders

An Li, Luping Xiang, Taihai Chen, Robert G. Maunder, Bashir M. Al-Hashimi, Lajos Hanzo
2016 IEEE Access  
Turbo codes facilitate near-capacity transmission throughputs by achieving a reliable iterative forward error correction. However, owing to the serial data dependence imposed by the logarithmic Bahl-Cocke-Jelinek-Raviv algorithm, the limited processing throughputs of the conventional turbo decoder implementations impose a severe bottleneck upon the overall throughputs of real-time communication schemes. Motivated by this, we recently proposed a floating-point fully parallel turbo decoder (FPTD)
more » ... algorithm, which eliminates the serial data dependence, allowing parallel processing and hence significantly reducing the number of clock cycles required. In this paper, we conceive a technique for reducing the critical datapath of the FPTD, and we propose a novel fixed-point version as well as its very large scale integration (VLSI) implementation. We also propose a novel technique, which allows the FPTD to also decode shorter frames employing compatible interleaver patterns. We strike beneficial tradeoffs amongst the latency, core area, and energy consumption by investigating the minimum bit widths and techniques for message log-likelihood ratio scaling and state metric normalization. Accordingly, the design flow and design tradeoffs considered in this paper are also applicable to other fixed-point implementations of error correction decoders. We demonstrate that upon using Taiwan Semiconductor Manufacturing Company (TSMC) 65-nm low-power technology for decoding the longest long-term evolution frames (6144 b) received over an additive white Gaussian noise channel having E b /N 0 = 1 dB, the proposed fixed-point FPTD VLSI achieves a processing throughput of 21.9 Gb/s and a processing latency of 0.28 µs. These results are 17.1 times superior to those of the state-of-the-art benchmarker. Furthermore, the proposed fixed-point FPTD VLSI achieves an energy consumption of 2.69 µJ/frame and a normalized core area of 5 mm 2 /Gb/s, which are 34% and 23% lower than those of the benchmarker, respectively. INDEX TERMS Fully-parallel turbo decoder, VLSI design, LTE turbo code. NOMENCLATURE
doi:10.1109/access.2016.2515719 fatcat:7ua5vo3xlrdanla77cbnlaj4ou