High-performance floating point divide

A.A. Liddicoat, M.J. Flynn
Proceedings Euromicro Symposium on Digital Systems Design  
In modern processors floating point divide operations often take 20 to 25 clock cycles, five times that of multiplication. Typically multiplicative algorithms with quadratic convergence are used for high-performance divide. A divide unit based on the multiplicative Newton-Raphson iteration is proposed. This divide unit utilizes the higher-order Newton-Raphson reciprocal approximation to compute the quotient fast, efficiently and with high throughput. The divide unit achieves fast execution by
more » ... mputing the square, cube and higher powers of the approximation directly and much faster than the traditional approach with serial multiplications. Additionally, the second, third, and higher-order terms are computed simultaneously further reducing the divide latency. Significant hardware reductions have been identified that reduce the overall computation significantly and therefore, reduce the area required for implementation and the power consumed by the computation. The proposed hardware unit is designed to achieve the desired quotient precision in a single iteration allowing the unit to be fully pipelined for maximum throughput.
doi:10.1109/dsd.2001.952327 dblp:conf/dsd/LiddicoatF01 fatcat:v5eraszkprdqdcbnjr4asin3xe