A new math library

Nelson H. F. Beebe
<span title="2009-11-05">2009</span> <i title="Wiley"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/vsgxjj5nlzbdzbclpkkpf6mjby" style="color: black;">International Journal of Quantum Chemistry</a> </i> &nbsp;
Common misconceptions about computer arithmetic J Integer arithmetic is always exact J Integer overflows are caught J Floating-point arithmetic is fuzzy J Floating-point equality comparisons are unreliable J Floating-point precision and range are adequate for everyone J Rounding errors accumulate J Computers execute arithmetic code in the order and precision in which it is written J Underflows are harmless J Overflows are disastrous J Sign of zero does not matter J Arithmetic exceptions should
more &raquo; ... ause job termination 's Z1, Z3, and Z4 (1936-1945): 22-bit (Z1 and Z3) and 32-bit Z4 with exponent range of 2 ±63 ≈ 10 ±19 J Burks, Goldstine, and von Neumann (1946) argued against floating-point arithmetic J It is difficult today to appreciate that probably the biggest problem facing programmers in the early 1950s was scaling numbers so as to achieve acceptable precision from a fixed-point machine, Martin Campbell-Kelly (1980) J IBM mainframes from mid-1950s supplied floating-point arithmetic J IEEE 754 Standard (1985) proposed a new design for binary floating-point arithmetic that has since been widely adopted J IEEE 754 design first implemented in Intel 8087 coprocessor (1980) Historical flaws on some systems Floating-point arithmetic can make error analysis difficult, with behavior like this in some older designs: wraps to overflow, and vice versa J division replaced by reciprocal approximation and multiply J poor rounding practices increase cumulative rounding error 0 1 22 255 octuple J s is sign bit (0 for +, 1 for −) J exp is unsigned biased exponent field J smallest exponent: zero and subnormals (formerly, denormalized) 0 1 22 255 octuple J s is sign bit (0 for +, 1 for −) J exp is unsigned biased exponent field J smallest exponent: zero and subnormals (formerly, denormalized) J largest exponent: Infinity and NaN (Not a Number) 0 1 22 255 octuple J s is sign bit (0 for +, 1 for −) J exp is unsigned biased exponent field J smallest exponent: zero and subnormals (formerly, denormalized) J largest exponent: Infinity and NaN (Not a Number) J significand has implicit leading 1-bit in all but 80-bit format J s is sign bit (0 for +, 1 for −) J exp is unsigned biased exponent field J smallest exponent: zero and subnormals (formerly, denormalized) J largest exponent: Infinity and NaN (Not a Number) J significand has implicit leading 1-bit in all but 80-bit format J ±0, ±∞, signaling and quiet NaN IEEE 754 binary floating-point arithmetic J NaN from 0/0, ∞ − ∞, f (NaN), x op NaN, . . . J NaN = NaN is distinguishing property, but botched by 10% of compilers J ±∞ from big/small, including nonzero/zero J precisions in bits: 24, 53, 64, 113, 235 is distinguishing property, but botched by 10% of compilers J ±∞ from big/small, including nonzero/zero J precisions in bits: 24, 53, 64, 113, 235 J approximate precisions in decimal digits: 7, 15, 19, 34, 70 J approximate ranges (powers of 10): [−45, 38], [−324, 308], IEEE 754 binary floating-point arithmetic J nonstop computing model J five sticky flags record exceptions: underflow , overflow , zero divide , invalid , and inexact J four rounding modes: to-nearest-with-ties-to-even (default), to-plus-infinity , to-minus-infinity , and to-zero J traps versus exceptions IEEE 754 binary floating-point arithmetic J nonstop computing model J five sticky flags record exceptions: underflow , overflow , zero divide , invalid , and inexact J four rounding modes: to-nearest-with-ties-to-even (default), to-plus-infinity , to-minus-infinity , and to-zero J traps versus exceptions J fixups in trap handlers impossible on heavily-pipelined or parallel architectures (since IBM System/360 Model 91 in 1968) IEEE 754 binary floating-point arithmetic J nonstop computing model J five sticky flags record exceptions: underflow , overflow , zero divide , invalid , and inexact J four rounding modes: to-nearest-with-ties-to-even (default), to-plus-infinity , to-minus-infinity , and to-zero J traps versus exceptions J fixups in trap handlers impossible on heavily-pipelined or parallel architectures (since IBM System/360 Model 91 in 1968) J no language support for advanced features until 1999 ISO C Standard IEEE 754 binary floating-point arithmetic J nonstop computing model J five sticky flags record exceptions: underflow , overflow , zero divide , invalid , and inexact J four rounding modes: to-nearest-with-ties-to-even (default), to-plus-infinity , to-minus-infinity , and to-zero J traps versus exceptions J fixups in trap handlers impossible on heavily-pipelined or parallel architectures (since IBM System/360 Model 91 in 1968) J no language support for advanced features until 1999 ISO C Standard J some architectures implement only subsets (e.g., no subnormals, or only one rounding mode, or only one kind of NaN, or in embedded systems, neither Infinity nor NaN) IEEE 754 binary floating-point arithmetic J nonstop computing model J five sticky flags record exceptions: underflow , overflow , zero divide , invalid , and inexact J four rounding modes: to-nearest-with-ties-to-even (default), to-plus-infinity , to-minus-infinity , and to-zero J traps versus exceptions J fixups in trap handlers impossible on heavily-pipelined or parallel architectures (since IBM System/360 Model 91 in 1968) J no language support for advanced features until 1999 ISO C Standard J some architectures implement only subsets (e.g., no subnormals, or only one rounding mode, or only one kind of NaN, or in embedded systems, neither Infinity nor NaN) J some platforms have nonconforming rounding behavior Why the base matters J accuracy and run-time cost of conversion between internal and external (usually decimal) bases J effective precision varies when the floating-point representation uses a radix larger than 2 or 10 J reducing the exponent width makes digits available for increased precision J for a fixed number of exponent digits, larger bases provide a wider exponent range, and reduce incidence of rounding J for a fixed storage size, granularity (the spacing between successive representable numbers) increases as the base increases J in the absence of underflow and overflow, multiplication by a power of the base is an exact operation, and this feature is essential for many computations, in particular, for accurate elementary and special functions Consider evaluation of z = x /(2y ): J In the binary base, optimum form is z = x/(y + y). J In a nonbinary base, compute z = 0.5 × (x/y). These alternatives avoid introducing unnecessary additional rounding error, and the second sacrifices speed for accuracy. Base conversion problem J exact in one base may be inexact in others (e.g., decimal 0.9 is hexadecimal 0x1.cccccccccccccccccccccccc...p-1) J 5% sales-tax example: binary arithmetic: 0.70 × 1.05 = 0.734999999 . . . , which rounds to 0.73; correct decimal result 0.735 may round to 0.74 Goldberg (1967) and Matula (1968) showed how many digits needed for exact round-trip conversion J exact conversion may require many digits: more than 11 500 decimal digits for binary-to-decimal conversion of 128-bit format, J base-conversion problem not properly solved until 1990s J few (if any) languages guarantee accurate base conversion J Absent in most computers from mid-1960s to 2007 J IBM Rexx and NetRexx scripting languages supply decimal arithmetic with arbitrary precision (10 9 digits) and huge exponent range (10 ±999 999 999 ) J IBM decNumber library provides portable decimal arithmetic, and leads to hardware designs in IBM zSeries (2006) and PowerPC (2007) J GNU compilers implement low-level support in late 2006 J business processing traditionally require 18D fixed-point decimal, but COBOL 2003 mandates 32D, and requires floating-point as well J four additional rounding modes for legal/tax/financial requirements J integer, rather than fractional, coefficient means redundant representation, but allows emulating fixed-point arithmetic J quantization primitives can distinguish between 1, 1.0, 1.00, 1.000, etc. J trailing zeros significant: they change quantization Library problem J Need much more than ADD, SUB, MUL, and DIV operations J mathcw library provides full C99 repertoire, including printf and scanf families, plus hundreds more [but not functions of type complex] J code is portable across all current platforms, and several historical ones (PDP-10, VAX, S/360, . . . ) J supports six binary and four decimal floating-point datatypes J separate algorithms cater to base variations: 2, 8, 10, and 16 J pair-precision functions for even higher precision J fused multiply-add (FMA) via pair-precision arithmetic J programming languages: Fused multiply-add J a × b + c is a common operation in numerical computation (e.g., nested Horner polynomial evaluation and matrix/vector arithmetic) J fma(a,b,c) computes a × b + c with exact double-length product and addition with one rounding J fma(a,b,c) recovers error in multiplication: d ← fl(a * b) err ← fma(a,b,-d) a × b = fl(a * b) + err J fma() in some native hardware [IBM PowerPC (32-bit and 64-bit only), HP/Intel IA-64 (32-bit, 64-bit, 80-bit), and some HP PA-RISC and MIPS R8000] J fma() is a critical component of many algorithms for accurate computation Fused multiply-add J Markstein's book shows how fma() leads to accurate and compact elementary functions, as well as provably-correctly-rounded software division and square root J Nievergelt [TOMS 2003] proved that fma() leads to matrix arithmetic provably accurate to the penultimate digit J See fparith.bib for many other applications of fma() MathCW design goals J Complete C99 and POSIX support with many enhancements J Portable across past, present, and future platforms J Binary and decimal arithmetic fully supported J Ten floating-point formats, including single (7D), double (16D), quadruple (34D), and octuple precision (70D) J IEEE 754 (1985 and 2008) and 854 feature access J Free software and documentation under GNU licenses J Documented in manual pages and forthcoming treatise J Interactive access in hoc J Interfaces to Ada, C, C++, C#, Fortran, Java, Pascal J Replace native binary arithmetic in all scripting languages with high-precision decimal arithmetic 20 March 2009 20 / 25 MathCW design goals [cont.] J Separate data from code J Abstract data types: fp t and hp t, and FP() and FUNC() wrappers J Make algorithm files base-, precision-, and range-independent when feasible J No platform software configuration needed J Offer static, shared, fat (multi-architecture), and wrapper libraries J Provide high relative accuracy: target is two ulps (units in the last place), but exponential, log, root, and trigonometric families return results that are (almost) always correctly rounded (and much better than Intel IA-32 rounding of 80-bit to 64-bit results) J Provide exact function argument reduction [Payne/Hanek at DEC (1982) and Corbett at Berkeley (1983)] J
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1002/qua.22266">doi:10.1002/qua.22266</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/l6cujysyvncwtkl7iw4kezalzm">fatcat:l6cujysyvncwtkl7iw4kezalzm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20160806040123/http://www.math.utah.edu/~beebe/talks/2009/maa/maa.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/8d/f7/8df78b18b26a94a2237c0f1de9a14a2d44421f89.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1002/qua.22266"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> wiley.com </button> </a>