next up previous contents
Next: Detailed IEEE representation Up: The IEEE standard Previous: The IEEE standard   Contents

Floating Point Types

The standard defines the following FP types:

Single Precision. (4 consecutive bytes/ number).

Useful for most short calculations.

Double Precision. (8 consecutive bytes/number)

Most often used with scientific and engineering numerical computations.

Extended Precision. (10 consecutive bytes/number).

Useful for temporary storage of intermediate results in long calculations. (e.g. compute a long inner product in extended precision then convert the result back to double) There is a single-extended format also. The standard suggests that implementations should support the extended format corresponding to the widest basic format supported (since all processors today allow for double precision, the double-extended format is the only one we discuss here). Extended precision enables libraries to efficiently compute quantities within 0.5 ulp. For example, the result of x*y is correct within 0.5 ulp, and so is the result of log(x). Clearly, computing the logarithm is a more involved operation than multiplication; the log library function performs all the intermediate computations in extended precision, then rounds the result to single or double precision, thus avoiding the corruption of more digits and achieving a 0.5 ulp accuracy. From the user point of view this is transparent, the log function returns a result correct within 0.5 ulp, the same accuracy as simple multiplication has.


next up previous contents
Next: Detailed IEEE representation Up: The IEEE standard Previous: The IEEE standard   Contents
Adrian Sandu 2001-08-26