Comparing the Results of Floating-Point Subtractions

Given four single-precision IEEE-754 floating-point numbers, a, b, c, and d, we want to know whether |a - b| < |c - d|. Will the following code always give the same answer that we would get if we performed all operations with infinite precision?

bool AbsDiffLT(float a, float b, float c, float d)
{
    return fabsf(a - b) < fabsf(c - d);
}

Background

The C library absolute value functions do not introduce errors into their results:

If successful, returns the absolute value of arg (|arg|). The value returned is exact and does not depend on any rounding modes.

FABS Documentation at CPPReference.com

According to Goldberg, the IEEE-754 standard requires basic arithmetic operations to be exactly rounded:

The IEEE standard requires that the result of addition, subtraction, multiplication and division be exactly rounded. That is, the result must be computed exactly and then rounded to the nearest floating-point number (using round to even).

What every computer scientist should know about floating-point

The round to even rule states that if the exact number falls midway, it is rounded to the nearest value with an even least significant digit. For example:

Example Value+11.5+12.5-11.5-12.5
Rounded Result+12.0+12.0-12.0-12.0

The default floating-point environment for MSVC sets the rounding mode to round to even:

When a process is initialized, the default floating point environment is set. This environment masks all floating point exceptions, sets the rounding mode to round to nearest (FE_TONEAREST), preserves subnormal (denormal) values, uses the default precision of significand (mantissa) for float, double, and long double values, and where supported, sets the infinity control to the default affine mode.

MSVC Documentation

There are several different floating-point behaviors that can be configured with MSVC’s /fp flag. By default MSVC compiles with /fp:precise, which preserves the default floating-point environment:

The compiler generates code intended to run in the default floating-point environment and assumes that the floating-point environment is not accessed or modified at runtime. That is, it assumes that the code does not unmask floating-point exceptions, read or write floating-point status registers, or change rounding modes.

MSVC Documentation

Failure Cases

If you subtract a very small number S from a very large number L, oftentimes the result will just be L because L – S is not representable and tiny deltas from L round to L. It is easy to construct two small, distinct numbers S1 < S2, that when subtracted from L, result in L because both L – S1 and L – S2 are not representable thus the result rounds to L. In these cases, our original function in question fails to produce a result consonant with computing the result with infinite precision, because fabsf(a - b) == fabsf(c - d).

The value std::numeric_limits<float>::epsilon() gives the difference between 1.0f and the next largest representable float. Note that 1.0f + std::numeric_limits<float>::epsilon() / 2.0f will round to 1.0f because of the aforementioned round to even rule. We can use this to construct test data that will fail our desired criteria as follows:

float fOnePlusEps = 1.0f + std::numeric_limits<float>::epsilon();
Assert(AbsDiffLT(fOnePlusEps, std::numeric_limits<float>::epsilon() / 4.0f,
                 fOnePlusEps, std::numeric_limits<float>::epsilon() / 8.0f)); // FAIL!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.