Sign Changes When Going from Int to Float and Back

sign changes when going from int to float and back

Your program is invoking undefined behavior because of an overflow in the conversion from floating-point to integer. What you see is only the usual symptom on x86 processors.

The float value nearest to 2147483584 is 231 exactly (the conversion from integer to floating-point usually rounds to the nearest, which can be up, and is up in this case. To be specific, the behavior when converting from integer to floating-point is implementation-defined, most implementations define rounding as being “according to the FPU rounding mode”, and the FPU's default rounding mode is to round to the nearest).

Then, while converting from the float representing 231 to int, an overflow occurs. This overflow is undefined behavior. Some processors raise an exception, others saturate. The IA-32 instruction cvttsd2si typically generated by compilers happens to always return INT_MIN in case of overflow, regardless of whether the float is positive or negative.

You should not rely on this behavior even if you know you are targeting an Intel processor: when targeting x86-64, compilers can emit, for the conversion from floating-point to integer, sequences of instructions that take advantage of the undefined behavior to return results other than what you might otherwise expect for the destination integer type.

Error on converting from unsigned int to float

The number 4294967295 in float (32-bit IEEE 754) is represented as follows:

0       10011111      00000000000000000000000
sign exponent mantissa
(+1) (2^32) (1.0)

The rule for converting it back to an integer (or long in this case) is:

sign * (2^exponent) * mantissa

and the result would be 4294967296 which is in appropriate size to fill long long for you but too big to be fit in unsigned int so you will get 0 for unsigned int conversion.

Note that the problem is the limitation of representing large numbers with float for example 4294967295 and 4294967200 both are representing the same bits when they are stored as floats.

Why there is loss of value when converting from int to float in the below code?

The float type uses the same number of bits as int (32 bits) to represent floating point numbers in the larger range than int uses to represent only integers.

This causes a loss of precision, since not every int number can be represented accurately by a float. Only 24 bits are used to represent the fraction part of the number (including the sign bit), while the other 8 are used to represent the exponent.

If you assign this int value to a double, there won't be any loss of precision, since double has 64 bits, and more than 32 of them are used to represent the fraction.

Here's a more detailed explanation:

The binary representation of 123456789 as an int is :

00000111 01011011 11001101 0001 0101

A single precision floating point number is constructed from its 32 bits using the following formula :

(-1)^sign * 1.b22 b21 ... b0 * 2^(e-127)

Where sign is the left most bit (b31). b22 to b0 are the fraction bits, and bits b30 to b23 make the exponent e.

Therefore, when you convert the int 123456789 to float, you can only use the following 25 bits :

00000111 01011011 11001101 00010101
- --- -------- -------- -----

We can safely get rid of any leading zeroes (except of the sign bit) and any trailing zeroes. This leaves you with the 3 least significant bits, which we must drop. We can either subtract 5 to get 123456784:

00000111 01011011 11001101 00010000
- --- -------- -------- -----

or add 3 to get 123456792:

00000111 01011011 11001101 00011000
- --- -------- -------- -----

Obviously adding 3 gives a better approximation.

Functions: changing variable from int to float

First question

It's only valid for integers(so square(5) gives me 25, but square (5.0) gives me 'TypeError: range() integer end argument expected, got float'.

Answer

Because range function is defined as taking only integer data. See Python documentation.

Example:

>>> range(5)
[0, 1, 2, 3, 4]

>>> range(5.0)
TypeError: range() integer end argument expected, got float.

>>> range("5")
TypeError: range() integer end argument expected, got str.

>>> range(0, 5)
[0, 1, 2, 3, 4]

>>> range(0, 5.0)
TypeError: range() integer end argument expected, got float.

Second question

How can I get this valid for floats or negative numbers?

Answer

I don't know what "valid" means. It depends on what you are trying to do. So please comment or update your question.

How to convert int to float in C?

Integer division truncates, so (50/100) results in 0. You can cast to float (better double) or multiply with 100.0 (for double precision, 100.0f for float precision) first,

double percentage;
// ...
percentage = 100.0*number/total;
// percentage = (double)number/total * 100;

or

float percentage;
// ...
percentage = (float)number/total * 100;
// percentage = 100.0f*number/total;

Since floating point arithmetic is not associative, the results of 100.0*number/total and (double)number/total * 100 may be slightly different (the same holds for float), but it's extremely unlikely to influence the first two places after the decimal point, so it probably doesn't matter which way you choose.

C++ int float casting

Integer division occurs, then the result, which is an integer, is assigned as a float. If the result is less than 1 then it ends up as 0.

You'll want to cast the expressions to floats first before dividing, e.g.

float m = static_cast<float>(a.y - b.y) / static_cast<float>(a.x - b.x);

Java Widening Conversion

I think there is a problem with the interpretation of the phrase "significant figure"?

Whether you have

1234567890 US-dollars

or

1234567936 US-dollars

is not that significant of a difference.

But if you instead had

   1234567 US-dollars

that would be a very significant difference.

1234567890
^ ^
| \
| less significant
|
very significant

Thus, an approximation of

1234567890

to 7 decimal significant figures is something between (approximately)

1234567390

and (approximately)

1234568390

In this case, it turns out to be roughly

1234567936

Precision of multiplication by 1.0 and int to float conversion

No.

If i is sufficiently large that int(float(i)) != i (assuming float is IEEE-754 single precision, i = 0x1000001 suffices to exhibit this) then this is false, because multiplication by 1.0f forces a conversion to float, which changes the value even though the subsequent multiplication does not.

However, if i is a 32-bit integer and double is IEEE-754 double, then it is true that int(i*1.0) == i.


Just to be totally clear, multiplication by 1.0f is exact. It's the conversion from int to float that may not be.



Related Topics



Leave a reply



Submit