Extracting Mantissa and Exponent from Double in C#

Extracting mantissa and exponent from double in c#

The binary format shouldn't change - it would certainly be a breaking change to existing specifications. It's defined to be in IEEE754 / IEC 60559:1989 format, as Jimmy said. (C# 3.0 language spec section 1.3; ECMA 335 section 8.2.2). The code in DoubleConverter should be fine and robust.

For the sake of future reference, the relevant bit of the code in the example is:

public static string ToExactString (double d)
{


// Translate the double into sign, exponent and mantissa.
long bits = BitConverter.DoubleToInt64Bits(d);
// Note that the shift is sign-extended, hence the test against -1 not 1
bool negative = (bits & (1L << 63)) != 0;
int exponent = (int) ((bits >> 52) & 0x7ffL);
long mantissa = bits & 0xfffffffffffffL;

// Subnormal numbers; exponent is effectively one higher,
// but there's no extra normalisation bit in the mantissa
if (exponent==0)
{
exponent++;
}
// Normal numbers; leave exponent as it is but add extra
// bit to the front of the mantissa
else
{
mantissa = mantissa | (1L << 52);
}

// Bias the exponent. It's actually biased by 1023, but we're
// treating the mantissa as m.0 rather than 0.m, so we need
// to subtract another 52 from it.
exponent -= 1075;

if (mantissa == 0)
{
return negative ? "-0" : "0";
}

/* Normalize */
while((mantissa & 1) == 0)
{ /* i.e., Mantissa is even */
mantissa >>= 1;
exponent++;
}


}

The comments made sense to me at the time, but I'm sure I'd have to think for a while about them now. After the very first part you've got the "raw" exponent and mantissa - the rest of the code just helps to treat them in a simpler fashion.

Mantissa Normalization of C# double

When you use BitConverter.DoubleToInt64Bits, it gives you the double value already encoded in IEEE 754 format. This means the significand is encoded with an implicit leading bit. (“Significand” is the preferred term for the fraction portion of a floating-point value and is used in IEEE 754. A significand is linear. A mantissa is logarithmic. “Mantissa” stems from the days when people had to use logarithms and paper and tables of functions to do crude calculations.) To recover the unencoded significand, you would have to restore the implicit bit.

That is not hard. Once you have separated the sign bit, the encoded exponent (as an integer), and the encoded significand (as an integer), then, for 64-bit binary floating-point:

  • If the encoded exponent is its maximum (2047) and the encoded significand is non-zero, the value is a NaN. There is additional information in the significand about whether the NaN is signaling or not and other user- or implementation-defined information.
  • If the encoded exponent is its maximum and the encoded significand is zero, the value is an infinity (+ or – according to the sign).
  • If the encoded exponent is zero, the implicit bit is zero, the actual significand is the encoded significand multiplied by 2–52, and the actual exponent is one minus the bias (1023) (so –1022).
  • Otherwise, the implicit bit is one, the actual significand is the encoded significand first multiplied by 2–52 and then added to one, and the actual exponent is the encoded exponent minus the bias (1023).

(If you want to work with integers and not have fractions for the significand, you can omit the multiplications by 2–52 and add –52 to the exponent instead. In the last case, the significand is added to 252 instead of to one.)

There is an alternative method that avoids BitConverter and the IEEE-754 encoding. If you can call the frexp routine from C#, it will return the fraction and exponent mathematically instead of as encodings. First, handle zeroes, infinities, and NaNs separately. Then use:

int exponent;
double fraction = frexp(value, &exponent);

This sets fraction to a value with magnitude in [½, 1) and exponent such that fraction•2exponent equals value. (Note that fraction still has the sign; you might want to separate that and use the absolute value.)

At this point, you can scale fraction as desired (and adjust exponent accordingly). To scale it so that it is an odd integer, you could multiply it by two repeatedly until it has no fractional part.

Wasted exponent bit in C# double representation

There are no wasted bits in a double.

Let's sort out your confusion. How do we turn a double from bits into a mathematical value? Let's assume the double is not zero, infinity, negative infinity, NaN or a denormal, because those all have special rules.

The crux of your confusion is mixing up decimal quantities with binary quantities. For this answer I'll put all binary quantities in this formatting and decimal quantities in regular formatting.

We take the 52 bits of the mantissa and we put them after 1. So in your example, that would be

1.0111001101101101001001001000010101110011000100100011

That's a binary number. So 1 + 0/2 + 1/4 + 1/8 + 1/16 + 0/32 ...

Then we take the 11 bits of the exponent, treat that as an 11 bit unsigned integer, and subtract 1023 from that value. So in your example we have 10000000100 which is the unsigned integer 1028. Subtract 1023, and we get 5.

Now we shift the "decimal place" (ha ha) by 5 places:

101110.01101101101001001001000010101110011000100100011

Note that this is equivalent to multiplying by 25. It is not multiplying by 105!

And now we multiply the whole thing by 1 if the sign bit is 0, and -1 if the sign bit is 1. So the final answer is

101110.01101101101001001001000010101110011000100100011

Let's see an example with a negative exponent.

Suppose the exponent had been 01111111100. That's 1020 as an unsigned integer. Subtract 1023. We get -3, so we would shift three places to the left, and get:

0.0010111001101101101001001001000010101110011000100100011

Let's see an example with a large exponent. What if the exponent had been 11111111100 ?

Work it out. That's 2044 in decimal. Subtract 1023. That's 1021. So this number would be the extremely large number that you get when multiplying 1.0111001101101101001001001000010101110011000100100011 by 21021.

So the value of that double is exactly equal to

32603055608669827528875188998863283395233949199438288081243712122350844851941321466156747022359800582932574058697506453751658312301708309704448596122037141141297743099124156580613023692715652869864010740666615694378079258090383719888417882332809291228958035810952632190230935024250237637887765563383983636480

Which is approximately 3.26030556 x 10307.

Is that now clear?


If this subject interests you, here's some further reading:

Code to decode a double into its parts:

https://ericlippert.com/2015/11/30/the-dedoublifier-part-one/

A simple arbitrary-precision rational:

https://ericlippert.com/2015/12/03/the-dedoublifier-part-two/

Code to turn a double into its exact rational:

https://ericlippert.com/2015/12/07/the-dedoublifier-part-three/

Representation of floats:

https://blogs.msdn.microsoft.com/ericlippert/2005/01/10/floating-point-arithmetic-part-one/

How Benford's Law is used to minimize representation errors:

https://blogs.msdn.microsoft.com/ericlippert/2005/01/13/floating-point-and-benfords-law-part-two/

What algorithm do we use to display floats as decimal quantities?

https://blogs.msdn.microsoft.com/ericlippert/2005/01/17/fun-with-floating-point-arithmetic-part-three/

What happens when you try to compare for equality floats of different precision levels?

https://blogs.msdn.microsoft.com/ericlippert/2005/01/18/fun-with-floating-point-arithmetic-part-four/

What properties of standard arithmetic fail to hold in floating point?

https://blogs.msdn.microsoft.com/ericlippert/2005/01/20/fun-with-floating-point-arithmetic-part-five/

How are infinities and divisions by zero represented?

https://blogs.msdn.microsoft.com/ericlippert/2009/10/15/as-timeless-as-infinity/

Extract double number value with exponent from text

Yep, I'd say regular expressions are your friend here:

var match = Regex.Match(input, @"[0-9.]+e[-+][0-9]+");

Or you can prevent matching multiple decimal points with the following (the last one will be treated as the "proper" one):

@"\b[0-9]+(.[0-9]+)e[-+][0-9]+\b"

Edit: Here's a more complete one that will allow optional exponents and will also allow for the decimal point to be at the start of the number:

@"[\d]*\.?[\d]+(e[-+][\d]+)?"

Decomposing Double into sign, mantissa and exponent in swift: check my work

As far as I can tell this should be easy but it doesn't work,

The FloatingPoint protocol, as defined https://github.com/apple/swift/blob/master/stdlib/public/core/FloatingPoint.swift.gyb , to which the Double struct complies already defines an exponent and a significand variable (on line 207 and 231 respectively).

And the Double struct https://github.com/apple/swift/blob/master/stdlib/public/core/FloatingPointTypes.swift.gyb , defines the exponent on line 389 and the significand on line 398.

How to get the mantissa and exponent of a double in a power of 10?

The thing about frexp is that if you are using IEEE 754 binary floating-point (or PDP-11 floating-point), it simply reveals the representation of the floating-point value. As such, the decomposition is exact, and it is bijective: you can reconstitute the argument from the significand and exponent you have obtained.

No equivalent reason exists for highlighting a power-of-ten decomposition function. Actually, defining such a function would pose a number of technical hurdles: Not all powers of ten are representable exactly as double: 0.1 and 1023 aren't. For the base-ten-significand part, do you want a value that arrives close to the argument when multiplied by 10e, or by the double-precision approximation of 10e? Some floating-point values may have several equally valid decompositions. Some floating-point values may have no decomposition.

If these accuracy and canonicity aspects do not matter to you, use e = floor(log10(abs(x))) for the base-10-exponent (or ceil for a convention more like the PDP-11-style frexp), and x / pow(10, e) for the base-10-significand. If it matters to you that the significand is between 1 and 10, you had better force this property by clamping.


NOTE: If what you want is convert x to decimal, the problem has been studied in depth. The first step of a correctly rounded conversion to decimal is not to compute a base-ten significand as a floating-point number (not in all cases anyway; for some inputs this can be an acceptable shortcut) because in the general case, this step would introduce approximations that are not admissible for a correctly rounded conversion function. For a quick-and-dirty conversion to decimal routine (in the case you have access to log10 but not to sprintf), it may be sufficient.



Related Topics



Leave a reply



Submit