Next: Special Arithmetic Operations
Up: Computer Representation of Numbers
Previous: IEEE Arithmetic
  Contents
Is useful when subtracting almost equal numbers.
Suppose
and
,
with 23 1's after the binary point. Both and are single
precision floating point numbers. The mathematical result is
. It is a floating point number also,
hence the numerical result should be identical to the
mathematical result,
.
When we subtract the numbers, we align them by shifting one position
to the right. If computer registers are 24-bit long,
then we may have one of the following situations.
1. Shift and ``chop'' it to single precision format
(to fit the register), then subtract.
The result is , twice the mathematical value.
2. Shift and ``round'' it to single precision format
(to fit the register), then subtract.
The result is , and all the meaningful information is lost.
3. Append the registers with an extra guard bit.
When we shift , the guard bit will hold the
1. The subtraction is then performed in 25 bits.
The result is normalized, and is rounded back to 24 bits.
This result is , precisely the mathematical value.
Funny fact: Cray supercomputers lack the guard bit.
In practice, many processors do subtractions and additions in extended
precision, even if the operands are single or double precision.
This provides effectively 16 guard bits for these operations.
This does not come for free: additional hardware makes the processor
more expensive; besides, the longer the word the slower the arithmetic
operation is.
The following theorem (see David Goldberg, p. 160) shows the importance of
the additional guard digit.
Let x, y be FP numbers in a FP system with
;
- if we compute using digits, then the relative rounding
error in the result can be as large as (i.e. all the digits are
corrupted!).
- if we compute using digits, then the relative rounding
error in the result is less than .
Note that, although using an additional guard digit greatly improves accuracy,
it does not guarantee that the result will be exactly rounded
(i.e. will obey the IEEE requirement). As an example consider
, in our toy FP system. In exact arithmetic,
,
which rounds to
. With the guard bit
arithmetic, we first shift and chop it to 4 digits,
.
Now
(calculation done with 4 mantissa digits).
When we round this number to the nearest (even) we obtain
,
a value different from the exactly rounded result.
However, by introducing a second guard digit and a third, ``sticky'' bit,
the result is the same as if the difference was computed exactly and then
rounded (D.Goldberg, p. 177).
Next: Special Arithmetic Operations
Up: Computer Representation of Numbers
Previous: IEEE Arithmetic
  Contents
Adrian Sandu
2001-08-26