Why Does Int(Float(Int.Max)) Give Me an Error

Why does Int(Float(Int.max)) give me an error?

There aren't enough bits in the mantissa of a Double or Float to accurately represent 19 significant digits, so you are getting a rounded result.

If you print the Float using String(format:) you can see a more accurate representation of the value of the Float:

let a = Int.max
print(a) // 9223372036854775807
let b = Float(a)
print(String(format: "%.1f", b)) // 9223372036854775808.0

So the value represented by the Float is 1 larger than Int.max.


Many values will be converted to the same Float value. The question becomes, how much would you have to reduce Int.max before it results in a different Double or Float value.

Starting with Double:

var y = Int.max

while Double(y) == Double(Int.max) {
y -= 1
}

print(Int.max - y) // 512

So with Double, the last 512 Ints all convert to the same Double.

Float has fewer bits to represent the value, so there are more values that all map to the same Float. Switching to - 1000 so that it runs in reasonable time:

var y = Int.max

while Float(y) == Float(Int.max) {
y -= 1000
}

print(Int.max - y) // 274877907000

So, your expectation that a Float could accurately represent a specific Int was misplaced.


Follow up question from the comments:

If float does not have enough bits to represent Int.max, how is it
able to represent a number one larger than that?

Floating point numbers are represented as two parts: mantissa and exponent. The mantissa represents the significant digits (in binary) and the exponent represents the power of 2. As a result, a floating point number can accurately express an even power of 2 by having a mantissa of 1 with an exponent that represents the power.

Numbers that are not even powers of 2 may have a binary pattern that contains more digits than can be represented in the mantissa. This is the case for Int.max (which is 2^63 - 1) because in binary that is 111111111111111111111111111111111111111111111111111111111111111 (63 1's). A Float which is 32 bits cannot store a mantissa which is 63 bits, so it has to be rounded or truncated. In the case of Int.max, rounding up by 1 results in the value
1000000000000000000000000000000000000000000000000000000000000000. Starting from the left, there is only 1 significant bit to be represented by the mantissa (the trailing 0's come for free), so this number is a mantissa of 1 and an exponent of 64.

See @MartinR's answer for an explanation of what Java is doing.

Why is Pycharm giving me a warning when using the max function with ints and floats vars as arguments

The problem is probably introduced by you specifically designating types - something that can be done in Python and is supported by PyCharm as well, of course, but isn't required as part of the language.

For example:

i: int = 1
a: float = 0.1
print(max(i, a))

This will show a PyCharm warning on a in print(max(i, a)).

And because PyCharm can infer the type, this will show the same warning:

i = 1
a = 0.1
print(max(i, a))

This on the other hand won't:

items = [1, 0.1]
print(max(items))

The reason for this is that PyCharm knows that the max() built-in function of Python will fail if incompatible types are passed into it. For example, max(1, '2') will cause a TypeError at runtime. That's why you get a warning if you pass in several arguments of varying types, PyCharm knows it may become a problem and will give the warning on the first argument that doesn't match the types of the preceding ones.

The reason that the list doesn't give you the same problem is that PyCharm only looks at the types of the call's arguments (a single list in this case), but doesn't have the information that the max() function will of course return the max of the elements in the list - it cannot determine this from how the max() function is defined in the libraries, even though it may be obvious to you and me.

You can avoid the error if you know it's not a problem by wrapping the arguments in an iterable like a list or tuple, by casting the integers to a float, or by explicitly ignoring the warning. Or by taking a look at your code and deciding if you really should be comparing ints and floats.

i: int = 1
a: float = 0.1
print(max([i, a]))
print(max(float(i), a))
# noinspection PyTypeChecker
print(max(i, a))

Note that the final 'solution' will be specific to PyCharm, the rest should give you good results in any editor.

Error when converting varchar(max) to int or float

Presumably, the problem is that some values are not in a numeric format. Try this instead:

select (case when isnumeric(value) = 1 then cast(value as float) end)
from table

This converts all the numbers to float, and puts NULLs in the remaining fields.

If you want to see the values that are causing problems, use this:

select value
from table
where isnumeric(value) = 0 and value is not null

Catching an error from a non-throwing method

You can use the init(exactly:) constructor, it will not throw an error but it will return nil if the value is to large

guard let value = Int(exactly: pow(Double(1000000000), Double(10))) else {
//error handling
}

Bizarre floating-point behavior with vs. without extra variables, why?

You are converting out-of-range double values to unsigned long long. This is not allowed in standard C++, and Visual C++ appears to treat it really badly in SSE2 mode: it leaves a number on the FPU stack, eventually overflowing it and making later code that uses the FPU fail in really interesting ways.

A reduced sample is

double d = 1E20;
unsigned long long ull[] = { d, d, d, d, d, d, d, d };
if (floor(d) != floor(d)) abort();

This aborts if ull has eight or more elements, but passes if it has up to seven.

The solution is not to convert floating point values to an integer type unless you know that the value is in range.

4.9 Floating-integral conversions [conv.fpint]

A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type. [ Note: If the destination type is bool, see 4.12. -- end note ]

The rule that out-of-range values wrap when converted to an unsigned type only applies if the value as already of some integer type.

For whatever it's worth, though, this doesn't seem like it's intentional, so even though the standard permits this behaviour, it may still be worth reporting this as a bug.

python - is there a function to return the maximum integer but not float from a list

If I understand correctly what you want to do, this should work:

def return_max_times_len(l):
lst_int = [i for i in l if isinstance(i, int)]
if len(lst_int) > 0:
return len(l) * max(lst_int)
return ""

What is the maximum value for an int32?

It's 2,147,483,647. Easiest way to memorize it is via a tattoo.

Converting Int to Float loses precision for large numbers in Swift

This is due to the way the floating-point format works. A Float is a 32-bit floating-point number, stored in the IEEE 754 format, which is basically scientific notation, with some bits allocated to the value, and some to the exponent (in base 2), as this diagram from the single-precision floating-point number Wikipedia article shows:

32-bit float format

So the actual number is represented as

(sign) * (value) * (2 ^ (exponent))

Because the number of bits allocated to actually storing an integer value (24) is smaller than the number of bits allocated for this in a normal integer (all 32), in order to make room for the exponent, the less significant digits of large numbers will be sacrificed, in exchange for the ability to represent almost infinite numbers (a normal Int can only represent integers in the range -2^31 to 2^31 - 1).

Some rough testing indicates that every integer up to and including 16777216 (2 ^ 24) can be represented exactly in a 32-bit float, while larger integers will be rounded to the nearest multiple of some power of 2.

Note that this isn't specific to Swift. This floating-point format is a standard format used in almost every programming language. Here's the output I get from LLDB with plain C:

Sample Image

If you need higher precision, use a Double. Double precision floats use 64 bits of memory, and have higher precision.



Related Topics



Leave a reply



Submit