Floating Point to Binary Value(C++)

Floating Point to Binary Value(C++)

Use union and bitset:

#include <iostream>
#include <bitset>
#include <climits>

int main()
{
union
{
float input; // assumes sizeof(float) == sizeof(int)
int output;
} data;

data.input = 2.25125;

std::bitset<sizeof(float) * CHAR_BIT> bits(data.output);
std::cout << bits << std::endl;

// or
std::cout << "BIT 4: " << bits[4] << std::endl;
std::cout << "BIT 7: " << bits[7] << std::endl;
}

It may not be an array but you can access bits with [] operator as if you were using an array.

Output

$ ./bits
01000000000100000001010001111011
BIT 4: 1
BIT 7: 0

Obtaining bit representation of a float in C

There are a large number of ways to accomplish this. Understand that what you are really trying to do is simply output the bits in memory that make up a float. Which in virtually all x86 type implementations are stored in IEEE-754 Single Precision Floating-Point Format. On x86 that is 32-bits of data. That is what allows a 'peek' at the bits while casting the float to unsigned (both are 32-bits, and bit-operations are defined for the unsigned type) For implementations other than x86, or even on x86 itself, a better choice for unsigned would be the exact length type of uint32_t provided by stdint.h. There can be no ambiguity in size that way.

Now, the cast itself isn't technically the problem, it is the access of the value though dereferncing the different type (a.k.a type-punning) where you run afoul of the strict-aliasing rule (Section 6.5 (7) of the C11 Standard). The union of the float and uint32_t types give you a valid way of looking at the float bits through an unsigned type window. (you are looking at the same bits either way, it's just how you access them and tell the compiler how they should be interpreted)

That said, you can glean good information from all of the answers here. You can write functions to access and store the bit representation of the float values in a string for later use, or output the bit values to the screen. As an exercise in playing with floating-point values a year or so back, I wrote a little function to output the bits in an annotated way that allowed easy identification of the sign, normalized exponent, and mantissa. You can adapt it or another of the answers routines to handle your needs. The short example is:

#include <stdio.h>
#include <stdint.h>
#include <limits.h> /* for CHAR_BIT */

/** formatted output of ieee-754 representation of float */
void show_ieee754 (float f)
{
union {
float f;
uint32_t u;
} fu = { .f = f };
int i = sizeof f * CHAR_BIT;

printf (" ");
while (i--)
printf ("%d ", (fu.u >> i) & 0x1);

putchar ('\n');
printf (" |- - - - - - - - - - - - - - - - - - - - - - "
"- - - - - - - - - -|\n");
printf (" |s| exp | mantissa"
" |\n\n");
}

int main (void) {

float f = 3.14159f;

printf ("\nIEEE-754 Single-Precision representation of: %f\n\n", f);
show_ieee754 (f);

return 0;
}

Example Use/Output

$ ./bin/floatbits

IEEE-754 Single-Precision representation of: 3.141590

0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 0 0 0 0
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -|
|s| exp | mantissa |

Look things over and let me know if you have any questions.

Convert floating point number 1864.78 to binary and IEEE format

You have two different conversion routines for converting the integer and fractional parts to binary. You understand how to convert 1864 to binary, but you have problems converting .78 to binary. Note: you must convert the actual fraction held in memory for the float 1864.78 which is 1864.780029 or fraction 0.780029 not 0.78. That appears where your "rounding" confusion is coming from.

To convert a fraction to its binary representation, you will multiply the fraction by 2 and if the resulting number has an integer part greater than 1, your binary representation of that bit is 1, if not your representation is 0. If greater than one, you subtract 1 from the number and repeat until you have exhausted the number or reached the limit of precision in question. For example:

number   : 1864.78
float : 1864.780029 (actual nearest representation in memory)
integer : 1864
fraction : 0.780029

2 * 0.780029 = 1.560059 => integer part (1) fraction (0.560059) => '1'
2 * 0.560059 = 1.120117 => integer part (1) fraction (0.120117) => '1'
2 * 0.120117 = 0.240234 => integer part (0) fraction (0.240234) => '0'
2 * 0.240234 = 0.480469 => integer part (0) fraction (0.480469) => '0'
2 * 0.480469 = 0.960938 => integer part (0) fraction (0.960938) => '0'
2 * 0.960938 = 1.921875 => integer part (1) fraction (0.921875) => '1'
2 * 0.921875 = 1.843750 => integer part (1) fraction (0.843750) => '1'
2 * 0.843750 = 1.687500 => integer part (1) fraction (0.687500) => '1'
2 * 0.687500 = 1.375000 => integer part (1) fraction (0.375000) => '1'
2 * 0.375000 = 0.750000 => integer part (0) fraction (0.750000) => '0'
2 * 0.750000 = 1.500000 => integer part (1) fraction (0.500000) => '1'
2 * 0.500000 = 1.000000 => integer part (1) fraction (0.000000) => '1'

note: how the floating-point fractional value will tend to zero rather than reaching your limit of digits. If you attempt to convert 0.78 (which is not capable of exact representation as the fraction to 1864.78 in a 32-bit floating point value) you will reach a different conversion in the 12th bit.

Once you have converted your fractional part to binary, you can continue with conversion into IEEE-754 single precision format. e.g.:

decimal  : 11101001000
fraction : 110001111011
sign bit : 0

The normalization for the biased exponent is:

 11101001000.110001111011  =>  1.1101001000110001111011

exponent bias: 10
unbiased exponent: 127
__________________+____

biased exponent: 137
binary exponent: 10001001

Conversion to 'hidden bit' format to form mantissa:

1.1101001000110001111011  =>  1101001000110001111011

Then use the sign bit + excess 127 exponent + mantissa to form the IEEE-754 single precision representation:

IEEE-754 Single Precision Floating Point Representation

0 1 0 0 0 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 0 0 1 1 1 1 0 1 1 0
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -|
|s| exp | mantissa |

Look it over and let me know if you have further questions. If you wanted a simple routine to fill a character array with the resulting conversion, you could do something similar to the following to convert a floating point fraction part to binary:

#define MANTISSA 23
...

/** return string containing binary representation of fraction
* The function takes a float as an argument and computes the
* binary representation of the fractional part of the float,
* On success, the function returns a null-terminated string
* containing the binary value, or NULL otherwise. The conversion
* is limited to the length of your MANTISSA (23-bits for single
* precission, 52-bits for double precision). You must insure
* you provide a buffer for 's' of at least MANTISSA + 1 bytes.
*/
char *fpfrc2bin (char *s, float fvalue)
{
/* obtain fractional value from fvalue */
float fv = fvalue > 1.0 ? fvalue - (int)fvalue : fvalue;
char *p = s;
unsigned char it = 0;

while (fv > 0 && it < MANTISSA + 1)
{ /* convert fraction */
fv = fv * 2.0;
*p++ = ((int)fv) ? '1' : '0';
*p = 0; /* nul-terminate */
fv = ((int)fv >= 1) ? fv - 1.0 : fv;
it++;
}

return s;
}

Floating point to binary conversion

Actually, bitset constructor accepts unsigned long in C++ 03 and unsigned long long in C++ 11.
Now, as for storing float in a bitset, this should do the trick:

float f = 0.0f;
cin >> f;
bitset<32> my_bit(*(uint32_t*)&f); // my_bit? What kind of a name is that, anyway?..

How do I display the binary representation of a float or double?

C/C++ is easy.

union ufloat {
float f;
unsigned u;
};

ufloat u1;
u1.f = 0.3f;

Then you just output u1.u.

Doubles just as easy.

union udouble {
double d;
unsigned long u;
}

because doubles are 64 bit.

Java is a bit easier: use Float.floatToRawIntBits() combined with Integer.toBinaryString() and Double.doubleToRawLongBits combined with Long.toBinaryString().

What is the binary format of a floating point number used by C++ on Intel based systems?

Floating-point format is determined by the processor, not the language or compiler. These days almost all processors (including all Intel desktop machines) either have no floating-point unit or have one that complies with IEEE 754. You get two or three different sizes (Intel with SSE offers 32, 64, and 80 bits) and each one has a sign bit, an exponent, and a significand. The number represented is usually given by this formula:

sign * (2**(E-k)) * (1 + S / (2**k'))

where k' is the number of bits in the significand and k is a constant around the middle range of exponents. There are special representations for zero (plus and minus zero) as well as infinities and other "not a number" (NaN) values.

There are definite quirks; for example, the fraction 1/10 cannot be represented exactly as a binary IEEE standard floating-point number. For this reason the IEEE standard also provides for a decimal representation, but this is used primarily by handheld calculators and not by general-purpose computers.

Recommended reading: David Golberg's What Every Computer Scientist Should Know About Floating-Point Arithmetic

C Provide float with binary bits produces the wrong number

The value 0x41C80000 in hex, is an integer that has the value 1103626240 in decimal. In your code, you are casting this value to a float which gives you this result:

x = 1103626240.000000

A solution for this can be made using a union:

union uint_to_float {
unsigned int u;
float f;
};

union uint_to_float u2f;
u2f.u = 0x41C80000;
printf("x = %f\n", u2f.f);

EDIT:
As mentioned by @chux, using uint32_t from stdint.h, instead of unsigned int is a better solution.



Related Topics



Leave a reply



Submit