How to Extract the Mantissa of a Double

how can I extract the mantissa of a double

In <math.h>

double frexp (double value, int *exp)

decompose VALUE in exponent and mantissa.

double ldexp (double value, int exp)

does the reverse.

To get an integer value, you have to multiply the result of frexp by FLT_RADIX exponent DBL_MANT_DIG (those are availble in <float.h>. To store that in an integer variable, you also need to find an adequate type (often a 64 bits type)

If you want to handle the 128 bits long double some implementations provide, you need C99 frexpl to do the splitting and then you probably don't have an adequate integer type to store the full result.

How to get the sign, mantissa and exponent of a floating point number

I think it is better to use unions to do the casts, it is clearer.

#include <stdio.h>

typedef union {
float f;
struct {
unsigned int mantisa : 23;
unsigned int exponent : 8;
unsigned int sign : 1;
} parts;
} float_cast;

int main(void) {
float_cast d1 = { .f = 0.15625 };
printf("sign = %x\n", d1.parts.sign);
printf("exponent = %x\n", d1.parts.exponent);
printf("mantisa = %x\n", d1.parts.mantisa);
}

Example based on http://en.wikipedia.org/wiki/Single_precision

Extracting mantissa and exponent from double in c#

The binary format shouldn't change - it would certainly be a breaking change to existing specifications. It's defined to be in IEEE754 / IEC 60559:1989 format, as Jimmy said. (C# 3.0 language spec section 1.3; ECMA 335 section 8.2.2). The code in DoubleConverter should be fine and robust.

For the sake of future reference, the relevant bit of the code in the example is:

public static string ToExactString (double d)
{


// Translate the double into sign, exponent and mantissa.
long bits = BitConverter.DoubleToInt64Bits(d);
// Note that the shift is sign-extended, hence the test against -1 not 1
bool negative = (bits & (1L << 63)) != 0;
int exponent = (int) ((bits >> 52) & 0x7ffL);
long mantissa = bits & 0xfffffffffffffL;

// Subnormal numbers; exponent is effectively one higher,
// but there's no extra normalisation bit in the mantissa
if (exponent==0)
{
exponent++;
}
// Normal numbers; leave exponent as it is but add extra
// bit to the front of the mantissa
else
{
mantissa = mantissa | (1L << 52);
}

// Bias the exponent. It's actually biased by 1023, but we're
// treating the mantissa as m.0 rather than 0.m, so we need
// to subtract another 52 from it.
exponent -= 1075;

if (mantissa == 0)
{
return negative ? "-0" : "0";
}

/* Normalize */
while((mantissa & 1) == 0)
{ /* i.e., Mantissa is even */
mantissa >>= 1;
exponent++;
}


}

The comments made sense to me at the time, but I'm sure I'd have to think for a while about them now. After the very first part you've got the "raw" exponent and mantissa - the rest of the code just helps to treat them in a simpler fashion.

bitwise splitting the mantissa of a IEEE 754 double? how to access bit structure,

This is straightforward, if a bit esoteric.

Step 1 is to access the individual bits of a float or double. There are a number of ways of doing this, but the commonest are to use a char * pointer, or a union. For our purposes today let's use a union. [There are subtleties to this choice, which I'll address in a footnote.]

union doublebits {
double d;
uint64_t bits;
};

union doublebits x;
x.d = 1234.6565;

So now x.bits lets us access the bits and bytes of our double value as a 64-bit unsigned integer. First, we could print them out:

printf("bits: %llx\n", x.bits);

This prints

bits: 40934aa04189374c

and we're on our way.

The rest is "simple" bit manipulation.
We'll start by doing it the brute-force, obvious way:

int sign = x.bits >> 63;
int exponent = (x.bits >> 52) & 0x7ff;
long long mantissa = x.bits & 0xfffffffffffff;

printf("sign = %d, exponent = %d, mantissa = %llx\n", sign, exponent, mantissa);

This prints

sign = 0, exponent = 1033, mantissa = 34aa04189374c

and these values exactly match the bit decomposition you showed in your question, so it looks like you were right about the number 1234.6565.

What we have so far are the raw exponent and mantissa values.
As you know, the exponent is offset, and the mantissa has an implicit leading "1", so let's take care of those:

exponent -= 1023;
mantissa |= 1ULL << 52;

(Actually this isn't quite right. Soon enough we're going to have to address some additional complications having to do with denormalized numbers, and infinities and NaNs.)

Now that we have the true mantissa and exponent, we can do some math to recombine them, to see if everything is working:

double check = (double)mantissa * pow(2, exponent);

But if you try that, it gives the wrong answer, and it's because of a subtlety that, for me, is always the hardest part of this stuff: Where is the decimal point in the mantissa, really?
(Actually, it's not a "decimal point", anyway, because we're not working in decimal. Formally it's a "radix point", but that sounds too stuffy, so I'm going to keep using "decimal point", even though it's wrong. Apologies to any pedants whom this rubs the wrong way.)

When we did mantissa * pow(2, exponent) we assumed a decimal point, in effect, at the right end of the mantissa, but really, it's supposed to be 52 bits to the left of that (where that number 52 is, of course, the number of explicit mantissa bits). That is, our hexadecimal mantissa 0x134aa04189374c (with the leading 1 bit restored) is actually supposed to be treated more like 0x1.34aa04189374c. We can fix this by adjusting the exponent, subtracting 52:

double check = (double)mantissa * pow(2, exponent - 52);
printf("check = %f\n", check);

So now check is 1234.6565 (plus or minus some roundoff error). And that's the same number we started with, so it looks like our extraction was correct in all respects.

But we have some unfinished business, because for a fully general solution, we have to handle "subnormal" (also known as "denormalized") numbers, and the special representations inf and NaN.

These wrinkles are controlled by the exponent field. If the exponent (before subtracting the bias) is exactly 0, this indicates a subnormal number, that is, one whose mantissa is not in the normal range of (decimal) 1.00000 to 1.99999. A subnormal number does not have the implicit leading "1" bit, and the mantissa ends up being in the range from 0.00000 to 0.99999. (This also ends up being the way the ordinary number 0.0 has to be represented, since it obviously can't have that implicit leading "1" bit!)

On the other hand, if the exponent field has its maximum value (that is, 2047, or 211-1, for a double) this indicates a special marker. In that case, if the mantissa is 0, we have an infinity, with the sign bit distinguishing between positive and negative infinity. Or, if the exponent is max and the mantissa is not 0, we have a "not a number" marker, or NaN. The specific nonzero value in the mantissa can be used to distinguish between different kinds of NaN, like "quiet" and "signaling" ones, although it turns out the particular values that might be used for this aren't standard, so we'll ignore that little detail.

(If you're not familiar with infinities and NaNs, they're what IEEE-754 says that certain operations are supposed to return when the proper mathematical result is, well, not an ordinary number. For example, sqrt(-1.0) returns NaN, and 1./0. typically gives inf. There's a whole set of IEEE-754 rules about infinities and NaNs, such as that atan(inf) returns π/2.)

The bottom line is that instead of just blindly tacking on the implicit 1 bit, we have to check the exponent value first, and do things slightly differently depending on whether the exponent has its maximum value (indicating specials), an in-between value (indicating ordinary numbers), or 0 (indicating subnormal numbers):

if(exponent == 2047) {
/* inf or NAN */
if(mantissa != 0)
printf("NaN\n");
else if(sign)
printf("-inf\n");
else printf("inf\n");
} else if(exponent != 0) {
/* ordinary value */
mantissa |= 1ULL << 52;
} else {
/* subnormal */
exponent++;
}

exponent -= 1023;

That last adjustment, adding 1 to the exponent for subnormal numbers, reflects the fact that subnormals are "interpreted with the value of the smallest allowed exponent, which is one greater" (per the Wikipedia article on subnormal numbers).

I said this was all "straightforward, if a bit esoteric", but as you can see, while extracting the raw mantissa and exponent values is indeed pretty straightforward, interpreting what they actually mean can be a challenge!


If you already have raw exponent and mantissa numbers, going back in the other direction — that is, constructing a double value from them — is just about as straightforward:

sign = 1;
exponent = 1024;
mantissa = 0x921fb54442d18;

x.bits = ((uint64_t)sign << 63) | ((uint64_t)exponent << 52) | mantissa;

printf("%.15f\n", x.d);

This answer is getting too long, so for now I'm not going to delve into the question of how to construct appropriate exponent and mantissa numbers from scratch for an arbitrary real number. (Me, I usually do the equivalent of x.d = atof(the number I care about), and then use the techniques we've been discussing so far.)


Your original question was about "bitwise splitting", which is what we've been discussing. But it's worth noting that there's a much more portable way to do all this, if you don't want to muck around with raw bits, and if you don't want/need to assume that your machine uses IEEE-754. If you just want to split a floating-point number into a mantissa and an exponent, you can use the standard library frexp function:

int exp;
double mant = frexp(1234.6565, &exp);
printf("mant = %.15f, exp = %d\n", mant, exp);

This prints

mant = 0.602859619140625, exp = 11

and that looks right, because 0.602859619140625 × 211 = 1234.6565 (approximately). (How does it compare to our bitwise decomposition? Well, our mantissa was 0x34aa04189374c, or 0x1.34aa04189374c, which in decimal is 1.20571923828125, which is twice the mantissa that ldexp just gave us. But our exponent was 1033 - 1023 = 10, which is one less, so it comes out in the wash: 1.20571923828125 × 210 = 0.602859619140625 × 211 = 1234.6565.)

There's also a function ldexp that goes in the other direction:

double x2 = ldexp(mant, exp);
printf("%f\n", x2);

This prints 1234.656500 again.


Footnote: When you're trying to access the raw bits of something, as of course we've been doing here, there are some lurking portability and correctness questions having to do with something called strict aliasing. Strictly speaking, and depending on who you ask, you may need to use an array of unsigned char as the other part of your union, not uint64_t as I've been doing here. And there are those who say that you can't portably use a union at all, that you have to use memcpy to copy the bytes into a completely separate data structure, although I think they're taking about C++, not C.

Convert double to integer mantissa and exponents

#include <cmath>        //  For frexp.
#include <iomanip> // For fixed and setprecision.
#include <iostream> // For cout.
#include <limits> // For properties of floating-point format.

int main(void)
{
double value = 0.15625;

// Separate value into significand in [.5, 1) and exponent.
int exponent;
double significand = std::frexp(value, &exponent);

// Scale significand by number of digits in it, to produce an integer.
significand = scalb(significand, std::numeric_limits<double>::digits);

// Adjust exponent to compensate for scaling.
exponent -= std::numeric_limits<double>::digits;

// Set stream to print significand in full.
std::cout << std::fixed << std::setprecision(0);

// Output triple with significand, base, and exponent.
std::cout << "(" << significand << ", "
<< std::numeric_limits<double>::radix << ", " << exponent << ")\n";
}

Sample output:

(5629499534213120, 2, -55)

(If the value is zero, you might wish to force the exponent to zero, for aesthetic reasons. Mathematically, any exponent would be correct.)

How to get the mantissa and exponent of a double in a power of 10?

The thing about frexp is that if you are using IEEE 754 binary floating-point (or PDP-11 floating-point), it simply reveals the representation of the floating-point value. As such, the decomposition is exact, and it is bijective: you can reconstitute the argument from the significand and exponent you have obtained.

No equivalent reason exists for highlighting a power-of-ten decomposition function. Actually, defining such a function would pose a number of technical hurdles: Not all powers of ten are representable exactly as double: 0.1 and 1023 aren't. For the base-ten-significand part, do you want a value that arrives close to the argument when multiplied by 10e, or by the double-precision approximation of 10e? Some floating-point values may have several equally valid decompositions. Some floating-point values may have no decomposition.

If these accuracy and canonicity aspects do not matter to you, use e = floor(log10(abs(x))) for the base-10-exponent (or ceil for a convention more like the PDP-11-style frexp), and x / pow(10, e) for the base-10-significand. If it matters to you that the significand is between 1 and 10, you had better force this property by clamping.


NOTE: If what you want is convert x to decimal, the problem has been studied in depth. The first step of a correctly rounded conversion to decimal is not to compute a base-ten significand as a floating-point number (not in all cases anyway; for some inputs this can be an acceptable shortcut) because in the general case, this step would introduce approximations that are not admissible for a correctly rounded conversion function. For a quick-and-dirty conversion to decimal routine (in the case you have access to log10 but not to sprintf), it may be sufficient.

extract binary value of mantissa

After const float mantissa = frexp(d, &exponent);, the value in mantissa is a number. It is not a decimal numeral. You can convert it to binary with:

float s = fabs(mantissa)*2;
std::cout << "The significand is " << (int) s << ".";
s = s - (int) s;
while (0 != s)
{
s *= 2;
std::cout << (int) s;
s = s - (int) s;
}
std::cout << ".\n";

For the sample value you show, this prints “The significand is 1.11011010111.”

The code simply multiplies the value by two to move the next bit “above” the decimal point. Then converting it to int produces that bit. Then the bit is removed and the process is repeated until all significant digits have been extracted.



Related Topics



Leave a reply



Submit