Converting Int to Float Loses Precision for Large Numbers in Swift

Converting Int to Float loses precision for large numbers in Swift

This is due to the way the floating-point format works. A Float is a 32-bit floating-point number, stored in the IEEE 754 format, which is basically scientific notation, with some bits allocated to the value, and some to the exponent (in base 2), as this diagram from the single-precision floating-point number Wikipedia article shows:

32-bit float format

So the actual number is represented as

(sign) * (value) * (2 ^ (exponent))

Because the number of bits allocated to actually storing an integer value (24) is smaller than the number of bits allocated for this in a normal integer (all 32), in order to make room for the exponent, the less significant digits of large numbers will be sacrificed, in exchange for the ability to represent almost infinite numbers (a normal Int can only represent integers in the range -2^31 to 2^31 - 1).

Some rough testing indicates that every integer up to and including 16777216 (2 ^ 24) can be represented exactly in a 32-bit float, while larger integers will be rounded to the nearest multiple of some power of 2.

Note that this isn't specific to Swift. This floating-point format is a standard format used in almost every programming language. Here's the output I get from LLDB with plain C:

If you need higher precision, use a Double. Double precision floats use 64 bits of memory, and have higher precision.

Converting String to Double/Float loses precision for large numbers in Swift 5

If you want to keep your floating precision you need to use Decimal type and make sure to use its string initializer:



let value = "0.0000335651599321165"
if let decimal = Decimal(string: value) {
print(decimal)
}

This will print:

0.0000335651599321165


edit/update:

When displaying your value to the user with a fixed number of fraction digits you can use Number Formatter and you can choose a rounding mode as well:



extension Formatter {
static let number = NumberFormatter()
}


extension Numeric {
func fractionDigits(min: Int = 6, max: Int = 6, roundingMode: NumberFormatter.RoundingMode = .halfEven) -> String {
Formatter.number.minimumFractionDigits = min
Formatter.number.maximumFractionDigits = max
Formatter.number.roundingMode = roundingMode
Formatter.number.numberStyle = .decimal
return Formatter.number.string(for: self) ?? ""
}
}


let value = "0.0000335651599321165"
if let decimal = Decimal(string: value) {
print(decimal.fractionDigits()) // "0.000034\n"
}

Why does converting an integer string to float and double produce different results?

OK, please look at the floating point converter at https://www.h-schmidt.net/FloatConverter/IEEE754.html. It shows you the bits stored when you enter a number in binary and hex representation, and also gives you the error due to conversion. The issue is with the way the number gets represented in the standard. In floating point, the error indeed comes out to be -1.

Actually, any number in the range 77777772 to 77777780 gives you 77777776 as the internal representation of mantissa.

Loss of precision in float substraction with Swift

You should calculate by an integer to avoid the floating point precision issue. Therefore, convert the float to an integer at first.

Is what you want the following code?

func gcd(var m: Int, var n: Int) -> Int {
if m < n {
(m, n) = (n, m)
}
if n == 0 {
return m
} else if m % n == 0 {
return n
} else {
return gcd(n, m % n)
}
}

func fractionize(var quantity: Float) -> String {
var i = 0
while quantity % 1 != 0 {
quantity = quantity * 10
i += 1
}

var numerator = Int(quantity)
var denominator = Int(pow(Double(10), Double(i)))

let divisor = gcd(numerator, denominator)

numerator /= divisor
denominator /= divisor

var wholeNumber = 0
if numerator > denominator {
wholeNumber = numerator / denominator
numerator -= denominator * wholeNumber
}

if wholeNumber > 0 {
return "\(wholeNumber) \(numerator)/\(denominator)"
} else {
return "\(numerator)/\(denominator)"
}
}

println(fractionize(0.4)) // 2/5
println(fractionize(1.4)) // 1 2/5
println(fractionize(2.4)) // 2 2/5
println(fractionize(0.5)) // 1/2
println(fractionize(0.7)) // 7/10

C# strange precision lost int to float and backwards

This was a comment to a now deleted answer:

The integer 28218681 can be written in binary as 1101011101001010100111001. Note that 25 digits are needed. Single precision has only 24 bits for its "mantissa" (including the implicit leading 1 bit). 24 is less than 25. Precision is lost. A single-precision representation "remembers" a number only by its leading 24 binary digits. That corresponds to roughly 7-8 decimal figures. Roughly. The integer 28218681 has just 8 figures, so the problem arises.


The lesson learned is, use a type that is "wide" enough to give the desired precision. For example a double precision number can hold the first ~16 decimal figures of a number.

This is not related to the discussion on whether to use a binary or a decimal format. Note that if the asker had used decimal32 instead of binary32 (another name for float), he would have had the exact same issue!

String to Double conversion loses precision in Swift

First off: you don't! What you encountered here is called floating point inaccuracy. Computers cannot store every number precisely. 2.4 cannot be stored lossless within a floating point type.

Secondly: Since floating point is always an issue and you are dealing with money here (I guess you are trying to store 2.4 franc) your number one solution is: don't use floating point numbers. Use the NSNumber you get from the numberFromString and do not try to get a Double out of it.

Alternatively shift the comma by multiplying and store it as Int.

The first solutions might look something like:

if let num = myDouble {
let value = NSDecimalNumber(decimal: num.decimalValue)
let output = value.decimalNumberByMultiplyingBy(NSDecimalNumber(integer: 10))
}

Safe conversion of Float to Int?

Int(exactly:) might be what you are looking for:

Creates an integer from the given floating-point value, if it can be represented exactly.

If the value passed as source is not representable exactly, the result is nil.

Example:

let x = 123e20
if let i = Int(exactly: x) {
print(i)
} else {
print("not representable")
}

This will also fail if the floating point number is not integral, so you might want to round it before the conversion:

let x = 12.3
if let i = Int(exactly: x.rounded(.towardZero)) {
print(i)
}

Rounding towards zero is what Int(x) would do, you can pick your desired rounding mode.

Float divison and casting in Swift

First, you should use Double and not Float. Float gives you very limited precision. Why you would want 6 digits precision with Float when you could have 15 digits with Double is hard to understand.

Second, a compiler does exactly what you tell it.

Float (sum) / Float (numbers.count)

takes the integer "sum", takes the integer "numbers.count", converts both to Float and divides. Divison of Float gives a result in this case of 3.5.

Float (sum/numbers.count)

divides the integer "sum" by the integer "numbers.count". Division of integers gives an integer result, which is the integer quotient disregarding any remainder. 21 / 6 equals 3 with a remainder of 3. So the result of the division is 3, which you then convert to the Float 3.0.

converting really large int to double, loss of precision on some computer

Let's say that the floating point number uses N bits of storage.

Now, let us assume that this float can precisely represent all integers that can be represented by an integer type of N bits. Since the N bit integer requires all of its N bits to represent all of its values, so would be the requirement for this float.

A floating point number should be able to represent fractional numbers. However, since all of the bits are used to represent the integers, there are zero bits left to represent any fractional number. This is a contradiction, and we must conclude that the assumption that float can precisely represent all integers as equally sized integer type must be erroneous.

Since there must be non-representable integers in the range of a N bit integer, it is possible that converting such integer to a floating point of N bits will lose precision, if the converted value happens to be one of the non-representable ones.


Now, since a floating point can represent a subset of rational numbers, some of those representable values may indeed be integers. In particular, the IEEE-754 spec guarantees that a binary double precision floating point can represent all integers up to 253. This property is directly associated with the length of the mantissa.

Therefore it is not possible to lose precision of a 32 bit integer when converting to a double on a system which conforms to IEEE-754.


More technically, the floating point unit of x86 architecture actually uses a 80-bit extended floating point format, which is designed to be able to represent precisely all of 64 bit integers and can be accessed using the long double type.



Related Topics



Leave a reply



Submit