Floating Point to Binary Value(C++)
Use union and bitset:
#include <iostream>
#include <bitset>
#include <climits>
int main()
{
union
{
float input; // assumes sizeof(float) == sizeof(int)
int output;
} data;
data.input = 2.25125;
std::bitset<sizeof(float) * CHAR_BIT> bits(data.output);
std::cout << bits << std::endl;
// or
std::cout << "BIT 4: " << bits[4] << std::endl;
std::cout << "BIT 7: " << bits[7] << std::endl;
}
It may not be an array but you can access bits with [] operator as if you were using an array.
Output
$ ./bits
01000000000100000001010001111011
BIT 4: 1
BIT 7: 0
Obtaining bit representation of a float in C
There are a large number of ways to accomplish this. Understand that what you are really trying to do is simply output the bits in memory that make up a float
. Which in virtually all x86 type implementations are stored in IEEE-754 Single Precision Floating-Point Format. On x86 that is 32-bits of data. That is what allows a 'peek' at the bits while casting the float
to unsigned
(both are 32-bits, and bit-operations are defined for the unsigned
type) For implementations other than x86, or even on x86 itself, a better choice for unsigned
would be the exact length type of uint32_t
provided by stdint.h
. There can be no ambiguity in size that way.
Now, the cast itself isn't technically the problem, it is the access of the value though dereferncing the different type (a.k.a type-punning) where you run afoul of the strict-aliasing rule (Section 6.5 (7) of the C11 Standard). The union
of the float
and uint32_t
types give you a valid way of looking at the float
bits through an unsigned
type window. (you are looking at the same bits either way, it's just how you access them and tell the compiler how they should be interpreted)
That said, you can glean good information from all of the answers here. You can write functions to access and store the bit representation of the float
values in a string for later use, or output the bit values to the screen. As an exercise in playing with floating-point values a year or so back, I wrote a little function to output the bits in an annotated way that allowed easy identification of the sign, normalized exponent, and mantissa. You can adapt it or another of the answers routines to handle your needs. The short example is:
#include <stdio.h>
#include <stdint.h>
#include <limits.h> /* for CHAR_BIT */
/** formatted output of ieee-754 representation of float */
void show_ieee754 (float f)
{
union {
float f;
uint32_t u;
} fu = { .f = f };
int i = sizeof f * CHAR_BIT;
printf (" ");
while (i--)
printf ("%d ", (fu.u >> i) & 0x1);
putchar ('\n');
printf (" |- - - - - - - - - - - - - - - - - - - - - - "
"- - - - - - - - - -|\n");
printf (" |s| exp | mantissa"
" |\n\n");
}
int main (void) {
float f = 3.14159f;
printf ("\nIEEE-754 Single-Precision representation of: %f\n\n", f);
show_ieee754 (f);
return 0;
}
Example Use/Output
$ ./bin/floatbits
IEEE-754 Single-Precision representation of: 3.141590
0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 0 0 0 0
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -|
|s| exp | mantissa |
Look things over and let me know if you have any questions.
Convert floating point number 1864.78 to binary and IEEE format
You have two different conversion routines for converting the integer and fractional parts to binary. You understand how to convert 1864
to binary, but you have problems converting .78
to binary. Note: you must convert the actual fraction held in memory for the float 1864.78
which is 1864.780029
or fraction 0.780029
not 0.78
. That appears where your "rounding" confusion is coming from.
To convert a fraction to its binary representation, you will multiply the fraction by 2
and if the resulting number has an integer part greater than 1
, your binary representation of that bit is 1
, if not your representation is 0
. If greater than one, you subtract 1
from the number and repeat until you have exhausted the number or reached the limit of precision in question. For example:
number : 1864.78
float : 1864.780029 (actual nearest representation in memory)
integer : 1864
fraction : 0.780029
2 * 0.780029 = 1.560059 => integer part (1) fraction (0.560059) => '1'
2 * 0.560059 = 1.120117 => integer part (1) fraction (0.120117) => '1'
2 * 0.120117 = 0.240234 => integer part (0) fraction (0.240234) => '0'
2 * 0.240234 = 0.480469 => integer part (0) fraction (0.480469) => '0'
2 * 0.480469 = 0.960938 => integer part (0) fraction (0.960938) => '0'
2 * 0.960938 = 1.921875 => integer part (1) fraction (0.921875) => '1'
2 * 0.921875 = 1.843750 => integer part (1) fraction (0.843750) => '1'
2 * 0.843750 = 1.687500 => integer part (1) fraction (0.687500) => '1'
2 * 0.687500 = 1.375000 => integer part (1) fraction (0.375000) => '1'
2 * 0.375000 = 0.750000 => integer part (0) fraction (0.750000) => '0'
2 * 0.750000 = 1.500000 => integer part (1) fraction (0.500000) => '1'
2 * 0.500000 = 1.000000 => integer part (1) fraction (0.000000) => '1'
note: how the floating-point fractional value will tend to zero rather than reaching your limit of digits. If you attempt to convert 0.78
(which is not capable of exact representation as the fraction to 1864.78
in a 32-bit floating point value) you will reach a different conversion in the 12th bit.
Once you have converted your fractional part to binary, you can continue with conversion into IEEE-754 single precision format. e.g.:
decimal : 11101001000
fraction : 110001111011
sign bit : 0
The normalization for the biased exponent is:
11101001000.110001111011 => 1.1101001000110001111011
exponent bias: 10
unbiased exponent: 127
__________________+____
biased exponent: 137
binary exponent: 10001001
Conversion to 'hidden bit' format to form mantissa:
1.1101001000110001111011 => 1101001000110001111011
Then use the sign bit + excess 127 exponent + mantissa to form the IEEE-754 single precision representation:
IEEE-754 Single Precision Floating Point Representation
0 1 0 0 0 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 0 0 1 1 1 1 0 1 1 0
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -|
|s| exp | mantissa |
Look it over and let me know if you have further questions. If you wanted a simple routine to fill a character array with the resulting conversion, you could do something similar to the following to convert a floating point fraction part to binary:
#define MANTISSA 23
...
/** return string containing binary representation of fraction
* The function takes a float as an argument and computes the
* binary representation of the fractional part of the float,
* On success, the function returns a null-terminated string
* containing the binary value, or NULL otherwise. The conversion
* is limited to the length of your MANTISSA (23-bits for single
* precission, 52-bits for double precision). You must insure
* you provide a buffer for 's' of at least MANTISSA + 1 bytes.
*/
char *fpfrc2bin (char *s, float fvalue)
{
/* obtain fractional value from fvalue */
float fv = fvalue > 1.0 ? fvalue - (int)fvalue : fvalue;
char *p = s;
unsigned char it = 0;
while (fv > 0 && it < MANTISSA + 1)
{ /* convert fraction */
fv = fv * 2.0;
*p++ = ((int)fv) ? '1' : '0';
*p = 0; /* nul-terminate */
fv = ((int)fv >= 1) ? fv - 1.0 : fv;
it++;
}
return s;
}
Floating point to binary conversion
Actually, bitset
constructor accepts unsigned long
in C++ 03 and unsigned long long
in C++ 11.
Now, as for storing float
in a bitset
, this should do the trick:
float f = 0.0f;
cin >> f;
bitset<32> my_bit(*(uint32_t*)&f); // my_bit? What kind of a name is that, anyway?..
How do I display the binary representation of a float or double?
C/C++ is easy.
union ufloat {
float f;
unsigned u;
};
ufloat u1;
u1.f = 0.3f;
Then you just output u1.u
.
Doubles just as easy.
union udouble {
double d;
unsigned long u;
}
because doubles are 64 bit.
Java is a bit easier: use Float.floatToRawIntBits() combined with Integer.toBinaryString() and Double.doubleToRawLongBits combined with Long.toBinaryString().
What is the binary format of a floating point number used by C++ on Intel based systems?
Floating-point format is determined by the processor, not the language or compiler. These days almost all processors (including all Intel desktop machines) either have no floating-point unit or have one that complies with IEEE 754. You get two or three different sizes (Intel with SSE offers 32, 64, and 80 bits) and each one has a sign bit, an exponent, and a significand. The number represented is usually given by this formula:
sign * (2**(E-k)) * (1 + S / (2**k'))
where k' is the number of bits in the significand and k is a constant around the middle range of exponents. There are special representations for zero (plus and minus zero) as well as infinities and other "not a number" (NaN) values.
There are definite quirks; for example, the fraction 1/10 cannot be represented exactly as a binary IEEE standard floating-point number. For this reason the IEEE standard also provides for a decimal representation, but this is used primarily by handheld calculators and not by general-purpose computers.
Recommended reading: David Golberg's What Every Computer Scientist Should Know About Floating-Point Arithmetic
C Provide float with binary bits produces the wrong number
The value 0x41C80000
in hex, is an integer that has the value 1103626240
in decimal. In your code, you are casting this value to a float
which gives you this result:
x = 1103626240.000000
A solution for this can be made using a union
:
union uint_to_float {
unsigned int u;
float f;
};
union uint_to_float u2f;
u2f.u = 0x41C80000;
printf("x = %f\n", u2f.f);
EDIT:
As mentioned by @chux, using uint32_t
from stdint.h
, instead of unsigned int
is a better solution.
Related Topics
Gcc 7, -Wimplicit-Fallthrough Warnings, and Portable Way to Clear Them
How to Resize an Image to a Specific Size in Opencv
Lambda Capture by Value Mutable Doesn't Work with Const &
How to Use Pre-Compiled Headers in Vc++ Without Requiring Stdafx.H
Macro and Member Function Conflict
Overloading Postfix and Prefix Operators
How to Find the Minimum Value in a Map
How Boost Auto-Linking Makes Choice
Reading from .Txt File into Two Dimensional Array in C++
How to Write an C/C++ Application That Writes to a /Var/Log/Myapp Directory
Is It a Strict Aliasing Violation to Alias a Struct as Its First Member
Is It Good Practice to Make Member Variables Protected
How Visitor Pattern Avoid Downcasting
Trouble with Dependent Types in Templates
Compiler Support for Upcoming C++0X