470 Hits in 4.8 sec

Low Latency Digit-Recurrence Reciprocal and Square-Root Reciprocal Algorithm and Architecture

E. Antelo, T. Lang, P. Montuschi, A. Nannarelli
17th IEEE Symposium on Computer Arithmetic (ARITH'05)  
T h e reciprocal and square-root reciprocal operations are in several applications. For these operations, we algorithms that combine a digitand one iteration of quadraticconvergence upproximation.  ...  T h e latter is implemented by a digit-recurrence, which uses the digits produced by the digit-by-digit part.  ...  This results in cyclesThe digit-by-digit algorithm for rcci and square root. reciprocal developed implementation is basically the reciprocal implementation with the of and reciprocal square-root, and before  ... 
doi:10.1109/arith.2005.29 dblp:conf/arith/AnteloLMN05 fatcat:45loncxdhfhmrdkqry4lf5wv5e

Page 1114 of IEEE Transactions on Computers Vol. 52, Issue 9 [page]

2003 IEEE Transactions on Computers  
A specific solution is given in Table 4. 5 CONCLUSIONS We have presented a reciprocal square-root algorithm by digit recurrence and selection by a staircase function.  ...  Lang, Division and Square Root: Digit Recurrence Algorithms and Implementations. Kluwer Academic, 1994 {7} M.D. Ercegovac, T. Lang, J.-M. Muller, and A.  ... 

Composite Iterative Algorithm and Architecture for q-th Root Calculation

Alvaro V´zquez, Javier D. Bruguera
2011 2011 IEEE 20th Symposium on Computer Arithmetic  
The algorithm is based on an optimized implementation of X 1/q = 2 (1/q) log 2 (X) by a sequence of parallel and/or overlapped operations: (1) reciprocal, (2) digit-recurrence logarithm, (3) left-to-right  ...  A detailed error analysis and two architectures are proposed, for low precision q and for higher precision q.  ...  This way, there is a number of algorithms and implementations for the two most frequent roots, the square root and the inverse square root calculation, including linear convergence digit-recurrence algorithms  ... 
doi:10.1109/arith.2011.16 dblp:conf/arith/VzquezB10 fatcat:q263el6ohzbxddhtjfwmhnirji

Radix-4 reciprocal square-root and its combination with division and square root

T. Lang, E. Antelo
2003 IEEE transactions on computers  
We present here a digit-recurrence algorithm and a The implementations of reciprocal square root used | Perf rad ix-4 implementation.  ...  We assume that the reader is familiar with these algorithms [6]. 2 RECIPROCAL SQUARE-ROOT ALGORITHM In this section, we develop the algorithm and the implementation of a low-radix reciprocal square-root  ... 
doi:10.1109/tc.2003.1228508 fatcat:ivfxdbksejgwhlfkyphhunolwm

High-speed double-precision computation of reciprocal, division, square root, and inverse square root

J.-A. Pineiro, J.D. Bruguera
2002 IEEE transactions on computers  
Two unfolded architectures are proposed: the first one computing only reciprocal and division operations, and the second one also including the computation of square root and inverse square root.  ...  A new method for the high-speed computation of double-precision floating-point reciprocal, division, square root, and inverse square root operations is presented in this paper.  ...  On one hand, digit-recurrence methods [4] , [8] , such as the SRT algorithm, result in small units, but their linear convergence sometimes leads to long latencies and makes them inadequate methods for  ... 
doi:10.1109/tc.2002.1146704 fatcat:uavk4w4ryja2dboij5ae7fwbhe

Division algorithms and implementations

S.F. Obermann, M.J. Flynn
1997 IEEE transactions on computers  
It is found that, for low-cost implementations where chip area must be minimized, digit recurrence algorithms are suitable.  ...  Division algorithms can be divided into five classes: digit recurrence, functional iteration, very high radix, table look-up, and variable latency.  ...  ACKNOWLEDGMENTS The authors would like to thank Nhon Quach and Grant McFarland for their helpful discussions and comments.  ... 
doi:10.1109/12.609274 fatcat:3ffbiptz7nan7knlqet7rvpnra

Review of Basic Classes of Dividers Based on Division Algorithm

Udayan S. Patankar, Ants Koel
2021 IEEE Access  
ACKNOWLEDGEMENT A preliminary patent is applied in Estonia based on the research work of developing a new algorithm for division. Application no-70390 date-June 2020.  ...  The SRT algorithm is one of the most popular of all the digit recurrence division algorithms to implement and one of the non-restoring digit recurrence algorithms.  ...  GPU and MIC have an advantage of parallel architecture for achieving low latency and execution time on account of the high area and complex controlling logic.  ... 
doi:10.1109/access.2021.3055735 fatcat:flnsfd2szvgavhkcop7nozrff4

Hardware Implementation of Single Iterated Multiplicative Inverse Square Root

Jun Luo, Qijun Huang, Hongwei Luo, Yue Zhi, Xiaoqiang Wang
2017 Elektronika ir Elektrotechnika  
It obtains more than 70 % of throughput improvement and almost 100 × higher precision over the inverse square root Intellectual Property (IP) from Altera.  ...  This paper presents hardware implementation of fixed-point single iterated multiplicative inverse square root.  ...  INTRODUCTION Arithmetic element functions (reciprocal, square root and inverse square root) are playing very important roles in digital signal processing, multimedia and scientific computing.  ... 
doi:10.5755/j01.eie.23.4.18717 fatcat:loora7zgsjaizjtyy3dygljcau

Modified Fast Inverse Square Root and Square Root Approximation Algorithms: The Method of Switching Magic Constants

Leonid V. Moroz, Volodymyr V. Samotyy, Oleh Y. Horyachyy
2021 Computation  
root and/or reciprocal square root.  ...  Algorithms are given in C/C++ for single- and double-precision numbers in the IEEE 754 format for both square root and reciprocal square root functions.  ...  Acknowledgments: The authors would like to thank Andrii Malohlovets and Petro Rudyi for providing microcontrollers for testing, and Marta Romanytsia for translating the draft version of this manuscript  ... 
doi:10.3390/computation9020021 fatcat:gdlehndewrft5l6ys3sdcsu64u

Floating Point Architecture Extensions for Optimized Matrix Factorization

A. Pedram, A. Gerstlauer, R. A. van de Geijn
2013 2013 IEEE 21st Symposium on Computer Arithmetic  
We show how adding moderate complexity to the architecture greatly alleviates complexities in the algorithm.  ...  This paper examines the mapping of algorithms encountered when solving dense linear systems and linear leastsquares problems to a custom Linear Algebra Processor.  ...  The kernel performance and utilization is low because of the dependencies and the latency of the inverse square-root operation.  ... 
doi:10.1109/arith.2013.21 dblp:conf/arith/PedramGG13 fatcat:v2nbca25rrfzvjc6m6n3kstpju

Radix-16 Combined Division and Square Root Unit

Alberto Nannarelli
2011 2011 IEEE 20th Symposium on Computer Arithmetic  
Division and square root, based on the digitrecurrence algorithm, can be implemented in a combined unit.  ...  The latency of the unit is reduced by retiming and low power methods are applied as well.  ...  ACKNOWLEDGMENTS The author wishes to thank Tomás Lang for his suggestions and comments on the design of the unit.  ... 
doi:10.1109/arith.2011.30 dblp:conf/arith/Nannarelli10 fatcat:4gyv7xxqz5hc7ehvs5ywu43at4

Reciprocation, square root, inverse square root, and some elementary functions using small multipliers

Milos D. Ercegovac, Tomas Lang, Jean-Michel Muller, Arnaud Tisserand, Franklin T. Luk
1998 Advanced Signal Processing Algorithms, Architectures, and Implementations VIII  
AbstractÐThis paper deals with the computation of reciprocals, square roots, inverse square roots, and some elementary functions using small tables, small multipliers, and, for some functions, a final  ...  We estimate the delay, the size/number of tables, and the size/number of multipliers and compare with other related methods.  ...  Cycle Time, Architecture, and Implementation.º We thank the reviewers for their careful and helpful comments.  ... 
doi:10.1117/12.325713 fatcat:mplk6vibn5ginn3mkdvz2byvw4

Floating-Point Division Operator based on CORDIC Algorithm

Pongyupinpanich Surapong, Faizal Arya Samman
1970 ECTI Transactions on Computer and Information Technology  
Design and evaluation of a CORDIC (COordinate Rotation DIgital Computer) algorithm for a floatingpoint division operation is presented in this paper.  ...  A hardware architecture of CORDIC algorithm capable of processing broader input ranges is implemented and presented in this paper by using a pre-processing and a post-processing stage.  ...  A combined floatingpoint square-root and division operation can also be implemented by using a subtractive SRT (Sweeney, Robertson and Tocher) algorithm [8] , which can be classified as a digit recurrence  ... 
doi:10.37936/ecti-cit.201371.54356 fatcat:du2eftzcfzaxznqtecaxbltajm

Faster floating-point square root for integer processors

Claude-Pierre Jeannerod, Herve Knochel, Christophe Monat, Guillaume Revy
2007 2007 International Symposium on Industrial Embedded Systems  
This is illustrated with the square root function, whose implementation given here is faster by over 35% than the previously best one for such systems.  ...  This paper presents some work in progress on fast and accurate floating-point arithmetic software for ST200-based embedded systems.  ...  algorithms based on digit recurrence (see [6] for a detailed and comprehensive study).  ... 
doi:10.1109/sies.2007.4297353 dblp:conf/sies/JeannerodKMR07 fatcat:sqzxcdgu6bdvzon2e3jkypozki

Power dissipation challenges in multicore floating-point units

Wei Liu, Alberto Nannarelli
2010 ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors  
We compare the implementation of division in a Fused Multiply-Add (FMA) unit based on the Newton-Raphson approximation algorithm to the implementation in a dedicated digit-recurrence unit.  ...  The results show a significant reduction of energy in a typical scientific application when the division digit-recurrence unit is used.  ...  However, in GPUs where several FP-units are grouped in clusters (e.g. in [4] ) it might be reasonable to include a digit-recurrence division (and square-root) unit in each cluster to reduce the power  ... 
doi:10.1109/asap.2010.5540986 dblp:conf/asap/LiuN10 fatcat:h4uce6ppcfgx7cetndiywadkfq
« Previous Showing results 1 — 15 out of 470 results