What Is a Subnormal Floating Point Number

What is a subnormal floating point number?

In the IEEE754 standard, floating point numbers are represented as binary scientific notation, x = M × 2e. Here M is the mantissa and e is the exponent. Mathematically, you can always choose the exponent so that 1 ≤ M < 2.* However, since in the computer representation the exponent can only have a finite range, there are some numbers which are bigger than zero, but smaller than 1.0 × 2emin. Those numbers are the subnormals or denormals.

Practically, the mantissa is stored without the leading 1, since there is always a leading 1, except for subnormal numbers (and zero). Thus the interpretation is that if the exponent is non-minimal, there is an implicit leading 1, and if the exponent is minimal, there isn't, and the number is subnormal.

*) More generally, 1 ≤ M < B  for any base-B scientific notation.

subnormal IEEE 754 floating point numbers support on iOS ARM devices (iPhone 4)

Can one set the iOS system to provide support for subnormal number without asking the compiler to produce only full software floating point code?

Yes. This can be achieved by setting the FZ bit in the FPSCR to zero:

static inline void DisableFZ( )
{
__asm__ volatile("vmrs r0, fpscr\n"
"bic r0, $(1 << 24)\n"
"vmsr fpscr, r0" : : : "r0");
}

Note that this can cause significant slowdowns in application performance when appreciable quantities of denormal values are encountered. You can (and should) restore the default floating-point state before making calls into any code that does not make an ABI guarantee to work properly in non-default modes:

static inline void RestoreFZ( ) {
__asm__ volatile("vmrs r0, fpscr\n"
"orr r0, $(1 << 24)\n"
"vmsr fpscr, r0" : : : "r0");
}

Please file a bug report to request that better documentation be provided for the modes of FP operation in iOS.

IEEE 754: rationale for format: subnormal and normal numbers

I checked the properties of both format using simplified example. For the sake of simplicity I use formats 0.F × 10^-2 and 1.F × 10^-3, where F has 2 decimal digits and there is no ±.

Min (non-zero) / max values:

Format          Min value (non-zero)           Max value
0.F × 10^-2 0.01 × 10^-2 = 0.0001 0.99 × 10^-2 = 0.0099
1.F × 10^-3 1.00 × 10^-3 = 0.001 9.99 × 10^-3 = 0.00999

Here is the graphical representation:

Sample Image

Here we see that starting from value 0.001 format 1.F × 10^-3 does not allow anymore to represent smaller values. However, format 0.F × 10^-2 allows to represent smaller values. Here is the zoomed-in version:

Sample Image

Conclusion: from the graphical representation we see that the properties of format 0.F × 10^-2 over format 1.F × 10^-3 are:

  1. gives more dynamic range: log10(max_real / min_real): 1.99 vs 0.99
  2. gives less precision: less values can be represented: 100 vs 900

It seems that for subnormals IEEE 754 preferred more dynamic range despite of less precision. Hence, that is why the format of subnormal numbers is ±(0.F) × 2^-126 and not ±(1.F) × 2^-127.

Does C99 assume that subnormal numbers are supported?

Does C99 assume that subnormal numbers are supported?

No. 5.2.4.2.2. The language defines a model of a floating point number. Then the language defines what is a subnormal floating point within that model. Then an interface is established how to detect and work with subnormal floating point numbers and how are they handled in corner cases - I mean, when exceptions are raised and when not.

It does not mean, that the underlying architecture uses this model to represent floating point numbers. The intention is to write the standard in an abstract way, trying to provide an interface without requiring how it should be implemented. Note 16:


  1. The floating-point model is intended to clarify the description of each floating-point characteristic and does not require the floating-point arithmetic of the implementation to be identical.

If the implementation implements Annex F, then the floating types match the formats described in IEC 60559, so it will have subnormal numbers. This is recommended practice, but optional, detected with a macro - there is no requirement.

the presence of FP_SUBNORMAL classification macro

There may be more FP_[A-Z]* macros provided by implementation for additional "kinds of floating point values".

the fact that in IEEE 754 support of subnormal numbers is required

But C does not require IEEE 754 support.

Create denormalized (subnormal) floating point values in C++

You could use -std::numeric_limits<T>::denorm_min() and std::numeric_limits<T>::denorm_min(). It is just incidental that the produced denormalized values have a special characteristic. If you don't want that, multiply by some reasonably small integer value.

Why are denormal floating-point values slower to handle?

With IEEE-754 floating-point most operands encountered are normalized floating-point numbers, and internal data paths in processors are built for normalized operands. Additional exponent bits may be used for internal representations to keep floating-point operands normalized inside the data path at all times.

Any subnormal inputs therefore require additional work to first determine the number of leading zeros to then left shift the significand for normalization while adjusting the exponent. A subnormal result requires right shifting the significand by the appropriate amount and rounding may need to be deferred until after that has happened.

If solved purely in hardware, this additional work typically requires additional hardware and additional pipeline stages: One, maybe even two, additional clock cycles each for handling subnormal inputs and subnormal outputs. But the performance of typical CPUs is sensitive to the latency of instructions, and significant effort is expended to keep latencies low. The latency of an FADD, FMUL, or FMA instruction is typically between 3 to 6 cycles depending on implementation and frequency targets.

Adding, say, 50% additional latency for the potential handling of subnormal operands is therefore unattractive, even more so because subnormal operands are rare for most use cases. Using the design philosophy of "make the common case fast, and the uncommon case functional" there is therefore a significant incentive to push the handling of subnormal operands out of the "fast path" (pure hardware) into a "slow path" (combination of existing hardware plus software).

I have participated in the design of floating-point units for x86 processors, and the common approach for handling subnormals is to invoke an internal micro-code level exception when these need to be handled. This subnormal handling may take on the order of 100 clock cycles. The most expensive part of that is typically not the execution of the fix-up code itself, but getting in and out of the microcode exception handler.

I am aware of specific use cases, for example particular filters in digital signal processing, where encountering subnormals is common. To support such applications at speed, many floating-point units support a non-standard flush-to-zero mode in which subnormal encodings are treated as zero.

Note that there are throughput-oriented processor designs with significant latency tolerance, in particular GPUs. I am familiar with NVIDIA GPUs, and best I can tell they handle subnormal operands without additional overhead and have done so for the past dozen years or so. Presumably this comes at the cost of additional pipeline stages, but the vendor does not document many of the microarchitectural details of these processors, so it is hard to know for sure. The following paper may provide some general insights how different hardware designs handle subnormal operands, some with very little overhead:

E.M. Schwarz, M. Schmookler, and S.D. Trong, "FPU implementations with denormalized numbers." IEEE Transactions on Computers, Vol. 54, No. 7, July 2005, pp. 825 - 836



Related Topics



Leave a reply



Submit