Filters








184 Hits in 9.2 sec

A Row-Parallel 8$\,\times\,$8 2-D DCT Architecture Using Algebraic Integer-Based Exact Computation

Arjuna Madanayake, Renato J. Cintra, Denis Onen, Vassil S. Dimitrov, Nilanka Rajapaksha, L. T. Bruton, Amila Edirisuriya
2012 IEEE transactions on circuits and systems for video technology (Print)  
An algebraic integer (AI) based time-multiplexed row-parallel architecture and two final-reconstruction step (FRS) algorithms are proposed for the implementation of bivariate AI-encoded 2-D discrete cosine  ...  The architecture directly realizes an error-free 2-D DCT without using FRSs between row-column transforms, leading to an 8×8 2-D DCT which is entirely free of quantization errors in AI basis.  ...  This works extends the 8-point 1-D AI-based DCT architecture [37, 41, 42] into a fully-parallel time-multiplexed 2-D architecture for 8×8 data blocks.  ... 
doi:10.1109/tcsvt.2011.2181232 fatcat:2i3p7vizmfhanchkjoqmtt3hsq

BIVARIATEALGEBRAIC INTEGERENCODED ARAI ALGORITHM FOR EXACT COMPUTATION OF DCT

Sumi Thomas .
2014 International Journal of Research in Engineering and Technology  
This algorithm realizes an error-free 2-D DCT without using Final Reconstruction Steps(FRS) between row-column transforms, leading to an 8×8 2-D DCT that is entirely free of quantization errors in AI basis  ...  integer coded computation.Bivariate Algebraic Integer(AI) encoded 2-D DCT algorithm ensuresquantization noise free implementation of2-D DCT.  ...  This paper extends the eightpoint 1-D AI-based DCT architecture into a fully parallel time-multiplexed 2-D architecture for 8×8 data blocks [4] which is further extended to 512 x 512 image and its inverse  ... 
doi:10.15623/ijret.2014.0313008 fatcat:p4w7cx3n4be3tjplgdrskzdab4

A Single-Channel Architecture for Algebraic Integer-Based 8$\,\times\,$8 2-D DCT Computation

Amila Edirisuriya, Arjuna Madanayake, Renato J. Cintra, Vassil S. Dimitrov, Nilanka Rajapaksha
2013 IEEE transactions on circuits and systems for video technology (Print)  
The proposed architecture computes 8×8 2-D DCT transform based on the Arai DCT algorithm.  ...  An area efficient row-parallel architecture is proposed for the real-time implementation of bivariate algebraic integer (AI) encoded 2-D discrete cosine transform (DCT) for image and video processing.  ...  Comparison with Other Architectures Conclusion An area efficient row-parallel architecture for 8×8 2-D DCT computation based on AI number representation leading to exact computations up to the FRS is  ... 
doi:10.1109/tcsvt.2013.2270397 fatcat:v2eaujrglnh4pjtw7bt4ibil6q

Computation of 2D $8\times 8$ DCT Based on the Loeffler Factorization Using Algebraic Integer Encoding

Diego F. Coelho, Sushmabhargavi Nimmalapalli, Vassil Dimitrov, Arjuna Madanayake, Renato J. Cintra, Arnaud Tisserand
2018 IEEE transactions on computers  
This paper proposes a computational method for 2D 8×8 DCT based on algebraic integers.  ...  The proposed algebraic integer architecture maintains error-free computations until an entire block of DCT coefficients having size 8×8 is computed, unlike algorithms in the literature which claim to be  ...  C (X k,· ), k = 2, 6 Return X k,l different architectures: a row-parallel 8×8 2D DCT architecture using algebraic integer-based exact computation [22] and a single-channel architecture for algebraic  ... 
doi:10.1109/tc.2018.2837755 fatcat:hdy3j44yojd35dip7xu5fj3dly

Asynchronous Realization of Algebraic Integer-Based 2D DCT Using Achronix Speedster SPD60 FPGA

Nilanka Rajapaksha, Amila Edirisuriya, Arjuna Madanayake, Renato J. Cintra, Dennis Onen, Ihab Amer, Vassil S. Dimitrov
2013 Journal of Electrical and Computer Engineering  
Recently proposed algebraic-integer-(AI-) based discrete cosine transform (DCT) algorithms are analyzed in the presence of quantization, using the High Efficiency Video Coding (HEVC) standard.  ...  Results indicate a 31% improvement over the integer DCT in the number of transform coefficients having error within 1%.  ...  Figure 2 : 2 The 2D AI-DCT consists of an input section having a decimation structure, 1D 8-point AI-DCT block for column-wise DCTs, a real-time AI transpose buffer [16] , four parallel 1D 8-point AI-DCT  ... 
doi:10.1155/2013/834793 fatcat:il567xqoefgcxd7h7lh2bhe6ea

Improved 8-Point Approximate DCT for Image and Video Compression Requiring Only 14 Additions

Uma Sadhvi Potluri, Arjuna Madanayake, Renato J. Cintra, Fabio M. Bayer, Sunera Kulasekera, Amila Edirisuriya
2014 IEEE Transactions on Circuits and Systems Part 1: Regular Papers  
In this paper, we introduce a novel 8-point DCT approximation that requires only 14 addition operations and no multiplications.  ...  Video processing systems such as HEVC requiring low energy consumption needed for the multimedia market has lead to extensive development in fast algorithms for the efficient approximation of 2-D DCT transforms  ...  All introduced implementations are sought to be fully parallel time-multiplexed 2-D architectures for 8×8 data blocks.  ... 
doi:10.1109/tcsi.2013.2295022 fatcat:s3p4j7kiife5fiplpgcvejb62i

Adaptive Approximated DCT Architectures for HEVC

Maurizio Masera, Maurizio Martina, Guido Masera
2016 IEEE transactions on circuits and systems for video technology (Print)  
The work shows the statistical analysis of the DCT usage and derives a pre-computation mechanism to adaptively skip rotations.  ...  Then, two 2D-DCT architectures are proposed: the first one is totally unfolded while the second one is folded.  ...  ACKNOWLEDGMENT The authors would like to thank the HPC@POLITO, a project of Academic Computing within the Department of Control and Computer Engineering at the Politecnico di Torino (http://www.hpc.polito.it  ... 
doi:10.1109/tcsvt.2016.2595320 fatcat:pidweuo5bzcrlcleov2mjypcxe

Fast Fourier-Based Phase Unwrapping on the Graphics Processing Unit in Real-Time Imaging Applications

Sam Jeught, Jan Sijbers, Joris Dirckx
2015 Journal of Imaging  
By executing the parallel implementation of a single-step Fourier-based phase unwrapping algorithm on the graphics processing unit of a standard graphics card, we were able to reduce the total processing  ...  A wide range of reconstruction algorithms has been developed to obtain the true, unwrapped phase by adding an integral multiple of 2π to each point of the wrapped grid.  ...  , in contrast, the parallel architecture of the graphics processing unit (GPU) clearly favors the matrix multiplication-based approach.  ... 
doi:10.3390/jimaging1010031 fatcat:opwlqu552rbgvi7aeepdt2vrce

Mapping Optimization of Affine Loop Nests for Reconfigurable Computing Architecture

Dajiang LIU, Shouyi YIN, Chongyong YIN, Leibo LIU, Shaojun WEI
2012 IEICE transactions on information and systems  
Reconfigurable computing system is a class of parallel architecture with the ability of computing in hardware to increase performance, while remaining much of flexibility of a software solution.  ...  Compared with DFG-based optimization approach, the execution performances of 1-d jacobi and matrix multiplication are improved by 28% and 48.47%.  ...  1-D reconfigurable architecture which is 9 times faster than UltraSPARC on image dithering;Rapid is a 1-D pipeline coarse grain processor which performs very close to its peak of 1.6 GOPS on 2-D DCT; and  ... 
doi:10.1587/transinf.e95.d.2898 fatcat:uuhngffr3bbbnfnhodqhzwbrda

Code Size and Accuracy-aware Synthesis of Fixed-point Programs for Matrix Multiplication
english

Matthieu Martel, Amine Najahi, Guillaume Revy
2014 Proceedings of the 4th International Conference on Pervasive and Embedded Computing and Communication Systems  
., 2007) , the Sung technique is used to suggest a number of linear algebra routines. This simulation based technique has two drawbacks: 1.  ...  d) 5: (u B , v B ), d B ← f indClosestPair(S B , d) 6: if d Ad B then 7: remove(u A , v A , S A ) 8: insert(u A ∪ v A , S A ) 9: else 10: remove(u B , v B , S B ) 11: insert(u B ∪ v B , S B ) 12:  ... 
doi:10.5220/0004884802040214 dblp:conf/peccs/MartelNR14 fatcat:qydmqgnsw5dctj5mparz3xzkbm

Iterative optimization in the polyhedral model

Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen, John Cavazos
2008 Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation - PLDI '08  
However, because optimizing compilers (1) use simplistic performance models that abstract away many of the complexities of modern architectures, (2) rely on inaccurate dependence analysis, and (3) lack  ...  of the target architecture.  ...  Using multidimensional schedules, a correct transformation (found using chunking [8] ) is simply: θ R (i, j) = (i, j) and θ P (i, j) = (i + 2, j).  ... 
doi:10.1145/1375581.1375594 dblp:conf/pldi/PouchetBCC08 fatcat:p6z3qjhxl5cx7jwjvlu7cabgie

Iterative optimization in the polyhedral model

Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen, John Cavazos
2008 SIGPLAN notices  
However, because optimizing compilers (1) use simplistic performance models that abstract away many of the complexities of modern architectures, (2) rely on inaccurate dependence analysis, and (3) lack  ...  of the target architecture.  ...  Using multidimensional schedules, a correct transformation (found using chunking [8] ) is simply: θ R (i, j) = (i, j) and θ P (i, j) = (i + 2, j).  ... 
doi:10.1145/1379022.1375594 fatcat:2g6qs7a4qzbixmf7mz3mw5gke4

Performance Portability Across Heterogeneous SoCs Using a Generalized Library-Based Approach

Shuangde Fang, Chengyong Wu, Zidong Du, Yuntan Fang, Yuanjie Huang, Yang Chen, Lieven Eeckhout, Olivier Temam, Huawei Li, Yunji Chen
2014 ACM Transactions on Architecture and Code Optimization (TACO)  
In this article, we present a software framework for achieving performance portability by leveraging a generalized library-based approach.  ...  Using a set of benchmarks run on a real heterogeneous SoC composed of a multicore processor and a GPU, we show that the runtime overhead is fairly small at 5.1% for the GPU and 6.4% for the multi-core.  ...  Fig. 7 . 7 Library wrapper code for the CUDA version of DCT. Fig. 8 . 8 Runtime.  ... 
doi:10.1145/2608253 fatcat:ekgjnxiy6jdoxim3t2snisx2ly

A Rapid Prototyping Environment for Wireless Communication Embedded Systems

Bryan A. Jones, Joseph R. Cavallaro
2003 EURASIP Journal on Advances in Signal Processing  
The emulator is customized for dataflow dominant architectures especially focusing on telecommunication related applications.  ...  technologies required to meet this challenge include new types of programmable components that offer novel trade-offs between flexibility and efficiency, models for exchange of intellectual property, and computer  ...  This work was funded by DARPA (under the PAC/C program; SIA GSRC), the US Army Research Office, and the Berkeley Wireless Research Center supporting companies.  ... 
doi:10.1155/s111086570330304x fatcat:3ird7hyyzjeb5bmrqa6ztx7574

An Efficient and Stable Parallel Solution for Non-symmetric Toeplitz Linear Systems [chapter]

Pedro Alonso, José M. Badía, Antonio M. Vidal
2005 Lecture Notes in Computer Science  
There also exist parallel algorithms for shared memory computers [7, 8, 9] and, more recently, several parallel algorithms for distributed architectures have been proposed [10] .  ...  On the other hand, one of our main goals is to offer efficient parallel algorithms for general purpose architectures, especially, clusters of personal computers.  ...  P is said an optimal S-preconditioner of a given matrix A if P − A F = min{ B − A F : B ∈ A S } , (18) where A S = {Sdiag(d)S : d ∈ R n } , is the algebra of matrices which are diagonalizable by the orthogonal  ... 
doi:10.1007/11403937_51 fatcat:rj5h5ijk3zbafcb65asqtggfce
« Previous Showing results 1 — 15 out of 184 results