Swift Difference Between Double and Float64

Swift Difference Between Double and Float64

The headers, found by hitting command+shift+o and searching for Float64, say:

/// A 64-bit floating point type.
public typealias Float64 = Double
/// A 32-bit floating point type.
public typealias Float32 = Float

and

Base floating point types 

Float32 32 bit IEEE float: 1 sign bit, 8 exponent bits, 23 fraction bits
Float64 64 bit IEEE float: 1 sign bit, 11 exponent bits, 52 fraction bits
Float80 80 bit MacOS float: 1 sign bit, 15 exponent bits, 1 integer bit, 63 fraction bits
Float96 96 bit 68881 float: 1 sign bit, 15 exponent bits, 16 pad bits, 1 integer bit, 63 fraction bits

Note: These are fixed size floating point types, useful when writing a floating
point value to disk. If your compiler does not support a particular size
float, a struct is used instead.
Use of of the NCEG types (e.g. double_t) or an ANSI C type (e.g. double) if
you want a floating point representation that is natural for any given
compiler, but might be a different size on different compilers.

As a general rule, unless you’re writing code that is dependent on binary representations, you should use the standard Float v Double names. But if you are writing something where binary compatibility is needed (e.g. writing/parsing binary Data to be exchanged with some other platform), then you can use the data types that bear the number of bits in the name, e.g. Float32 vs. Float64 vs. Float80.

What are Float32 (alias of Float) and Float64 (alias of Double) for?

Float64 is a type alias for Double, so they are the same type. The same with Float32 and Float.

The names Float and Double (or float and double) are older names, used in many other languages, including C, C++, Java and C#.

[In some languages on some (exotic or very very old) platforms, they are not even 32 and 64 bit. But that doesn't matter for Swift.]

The newer names, Float32, Float64 and Float80 tell you the exact size of the type and are therefore a little clearer and less ambiguous. But for long time programmers, they need some getting used to. I'm not sure if Float80 is available on 64 bit platforms.

Why does converting an integer string to float and double produce different results?

OK, please look at the floating point converter at https://www.h-schmidt.net/FloatConverter/IEEE754.html. It shows you the bits stored when you enter a number in binary and hex representation, and also gives you the error due to conversion. The issue is with the way the number gets represented in the standard. In floating point, the error indeed comes out to be -1.

Actually, any number in the range 77777772 to 77777780 gives you 77777776 as the internal representation of mantissa.

Double vs Float80 speed in Swift

While it's true that the x87 FPU operates internally at 80-bit "extended" precision (at least, by default; this is customizable, and in fact 32-bit builds following the macOS ABI set 64-bit internal precision), binaries targeting x86-64 no longer use x87 FPU instructions. All x86 chips that implement the 64-bit long mode extension also support SSE2 (in fact, this was required by the AMD64 specification), so a 64-bit binary can always assume SSE2 support. As such, this is what is used to implement floating-point operations, because it's much more efficient and easier to optimize around for a compiler.

Even 32-bit builds in the modern era assume SSE2 as a minimum, and certainly on the Macintosh platform, since SSE2 was introduced with the Pentium 4, which predated the Macintosh platform's switch to Intel x86 chips. All x86 chips ever used in an Apple machine support SSE2.

So no, you aren't going to see any performance improvement by using an 80-bit extended precision type. You weren't going to see any performance improvement from x87 instructions, even if they were generated by the compiler. And you certainly aren't going to see any performance improvement on x86-64, because SSE2 supports a maximum of 64-bit precision in hardware. Any 80-bit precision operations are going to have to be implemented in software, or force a smart compiler to emit x87 instructions, which means you don't benefit from any of the nice features and tangible performance improvements of SSE2.

More precision than double in swift

Yes there is! There is Float80 exactly for that, it stores 80 bits (duh), 10 bytes. You can use it like any other floating point type. Note that there are Float32, Float64 and Float80 in Swift, where Float32 is just a typealias for Float and Float64 is one for Double

Rounding a double value to x number of decimal places in swift

You can use Swift's round function to accomplish this.

To round a Double with 3 digits precision, first multiply it by 1000, round it and divide the rounded result by 1000:

let x = 1.23556789
let y = Double(round(1000 * x) / 1000)
print(y) /// 1.236

Unlike any kind of printf(...) or String(format: ...) solutions, the result of this operation is still of type Double.

EDIT:

Regarding the comments that it sometimes does not work, please read this: What Every Programmer Should Know About Floating-Point Arithmetic

What's the difference between using CGFloat and float?

As @weichsel stated, CGFloat is just a typedef for either float or double. You can see for yourself by Command-double-clicking on "CGFloat" in Xcode — it will jump to the CGBase.h header where the typedef is defined. The same approach is used for NSInteger and NSUInteger as well.

These types were introduced to make it easier to write code that works on both 32-bit and 64-bit without modification. However, if all you need is float precision within your own code, you can still use float if you like — it will reduce your memory footprint somewhat. Same goes for integer values.

I suggest you invest the modest time required to make your app 64-bit clean and try running it as such, since most Macs now have 64-bit CPUs and Snow Leopard is fully 64-bit, including the kernel and user applications. Apple's 64-bit Transition Guide for Cocoa is a useful resource.

Why not use Double or Float to represent currency?

Because floats and doubles cannot accurately represent the base 10 multiples that we use for money. This issue isn't just for Java, it's for any programming language that uses base 2 floating-point types.

In base 10, you can write 10.25 as 1025 * 10-2 (an integer times a power of 10). IEEE-754 floating-point numbers are different, but a very simple way to think about them is to multiply by a power of two instead. For instance, you could be looking at 164 * 2-4 (an integer times a power of two), which is also equal to 10.25. That's not how the numbers are represented in memory, but the math implications are the same.

Even in base 10, this notation cannot accurately represent most simple fractions. For instance, you can't represent 1/3: the decimal representation is repeating (0.3333...), so there is no finite integer that you can multiply by a power of 10 to get 1/3. You could settle on a long sequence of 3's and a small exponent, like 333333333 * 10-10, but it is not accurate: if you multiply that by 3, you won't get 1.

However, for the purpose of counting money, at least for countries whose money is valued within an order of magnitude of the US dollar, usually all you need is to be able to store multiples of 10-2, so it doesn't really matter that 1/3 can't be represented.

The problem with floats and doubles is that the vast majority of money-like numbers don't have an exact representation as an integer times a power of 2. In fact, the only multiples of 0.01 between 0 and 1 (which are significant when dealing with money because they're integer cents) that can be represented exactly as an IEEE-754 binary floating-point number are 0, 0.25, 0.5, 0.75 and 1. All the others are off by a small amount. As an analogy to the 0.333333 example, if you take the floating-point value for 0.01 and you multiply it by 10, you won't get 0.1. Instead you will get something like 0.099999999786...

Representing money as a double or float will probably look good at first as the software rounds off the tiny errors, but as you perform more additions, subtractions, multiplications and divisions on inexact numbers, errors will compound and you'll end up with values that are visibly not accurate. This makes floats and doubles inadequate for dealing with money, where perfect accuracy for multiples of base 10 powers is required.

A solution that works in just about any language is to use integers instead, and count cents. For instance, 1025 would be $10.25. Several languages also have built-in types to deal with money. Among others, Java has the BigDecimal class, and Rust has the rust_decimal crate, and C# has the decimal type.



Related Topics



Leave a reply



Submit