Is Using Double Faster Than Float

Is using double faster than float?

There isn't a single "intel CPU", especially in terms of what operations are optimized with respect to others!, but most of them, at CPU level (specifically within the FPU), are such that the answer to your question:

are double operations just as fast or
faster than float operations for +, -,
*, and /?

is "yes" -- within the CPU, except for division and sqrt which are somewhat slower for double than for float. (Assuming your compiler uses SSE2 for scalar FP math, like all x86-64 compilers do, and some 32-bit compilers depending on options. Legacy x87 doesn't have different widths in registers, only in memory (it converts on load/store), so historically even sqrt and division were just as slow for double).

For example, Haswell has a divsd throughput of one per 8 to 14 cycles (data-dependent), but a divss (scalar single) throughput of one per 7 cycles. x87 fdiv is 8 to 18 cycle throughput. (Numbers from https://agner.org/optimize/. Latency correlates with throughput for division, but is higher than the throughput numbers.)

The float versions of many library functions like logf(float) and sinf(float) will also be faster than log(double) and sin(double), because they have many fewer bits of precision to get right. They can use polynomial approximations with fewer terms to get full precision for float vs. double


However, taking up twice the memory for each number clearly implies heavier load on the cache(s) and more memory bandwidth to fill and spill those cache lines from/to RAM; the time you care about performance of a floating-point operation is when you're doing a lot of such operations, so the memory and cache considerations are crucial.

@Richard's answer points out that there are also other ways to perform FP operations (the SSE / SSE2 instructions; good old MMX was integers-only), especially suitable for simple ops on lot of data ("SIMD", single instruction / multiple data) where each vector register can pack 4 single-precision floats or only 2 double-precision ones, so this effect will be even more marked.

In the end, you do have to benchmark, but my prediction is that for reasonable (i.e., large;-) benchmarks, you'll find advantage to sticking with single precision (assuming of course that you don't need the extra bits of precision!-).

double or float, which is faster?

Depends on what the native hardware does.

  • If the hardware is (or is like) x86 with legacy x87 math, float and double are both extended (for free) to an internal 80-bit format, so both have the same performance (except for cache footprint / memory bandwidth)

  • If the hardware implements both natively, like most modern ISAs (including x86-64 where SSE2 is the default for scalar FP math), then usually most FPU operations are the same speed for both. Double division and sqrt can be slower than float, as well as of course being significantly slower than multiply or add. (Float being smaller can mean fewer cache misses. And with SIMD, twice as many elements per vector for loops that vectorize).

  • If the hardware implements only double, then float will be slower if conversion to/from the native double format isn't free as part of float-load and float-store instructions.

  • If the hardware implements float only, then emulating double with it will cost even more time. In this case, float will be faster.

  • And if the hardware implements neither, and both have to be implemented in software. In this case, both will be slow, but double will be slightly slower (more load and store operations at the least).

The quote you mention is probably referring to the x86 platform, where the first case was given. But this doesn't hold true in general.

Also beware that x * 3.3 + y for float x,y will trigger promotion to double for both variables. This is not the hardware's fault, and you should avoid it by writing 3.3f to let your compiler make efficient asm that actually keeps numbers as floats if that's what you want.

Are doubles faster than floats in C#?

The short answer is, "use whichever precision is required for acceptable results."

Your one guarantee is that operations performed on floating point data are done in at least the highest precision member of the expression. So multiplying two float's is done with at least the precision of float, and multiplying a float and a double would be done with at least double precision. The standard states that "[floating-point] operations may be performed with higher precision than the result type of the operation."

Given that the JIT for .NET attempts to leave your floating point operations in the precision requested, we can take a look at documentation from Intel for speeding up our operations. On the Intel platform your floating point operations may be done in an intermediate precision of 80 bits, and converted down to the precision requested.

From Intel's guide to C++ Floating-point Operations1 (sorry only have dead tree), they mention:

  • Use a single precision type (for example, float) unless the extra precision obtained through double or long double is required. Greater precision types increase memory size and bandwidth requirements.
    ...
  • Avoid mixed data type arithmetic expressions

That last point is important as you can slow yourself down with unnecessary casts to/from float and double, which result in JIT'd code which requests the x87 to cast away from its 80-bit intermediate format in between operations!

1. Yes, it says C++, but the C# standard plus knowledge of the CLR lets us know the information for C++ should be applicable in this instance.

Float vs Double Performance

On x86 processors, at least, float and double will each be converted to a 10-byte real by the FPU for processing. The FPU doesn't have separate processing units for the different floating-point types it supports.

The age-old advice that float is faster than double applied 100 years ago when most CPUs didn't have built-in FPUs (and few people had separate FPU chips), so most floating-point manipulation was done in software. On these machines (which were powered by steam generated by the lava pits), it was faster to use floats. Now the only real benefit to floats is that they take up less space (which only matters if you have millions of them).

Is float slower than double? Does 64 bit program run faster than 32 bit program?

The classic x86 architecture uses floating-point unit (FPU) to perform floating-point calculations. The FPU performs all calculations in its internal registers, which have 80-bit precision each. Every time you attempt to work with float or double, the variable is first loaded from memory into the internal register of the FPU. This means that there is absolutely no difference in the speed of the actual calculations, since in any case the calculations are carried out with full 80-bit precision. The only thing that might be different is the speed of loading the value from memory and storing the result back to memory. Naturally, on a 32-bit platform it might take longer to load/store a double as compared to float. On a 64-bit platform there shouldn't be any difference.

Modern x86 architectures support extended instruction sets (SSE/SSE2) with new instructions that can perform the very same floating-point calculations without involving the "old" FPU instructions. However, again, I wouldn't expect to see any difference in calculation speed for float and double. And since these modern platforms are 64-bit ones, the load/store speed is supposed to be the same as well.

On a different hardware platform the situation could be different. But normally a smaller floating-point type should not provide any performance benefits. The main purpose of smaller floating-point types is to save memory, not to improve performance.

Edit: (To address @MSalters comment)
What I said above applies to fundamental arithmetical operations. When it comes to library functions, the answer will depend on several implementation details. If the platform's floating-point instruction set contains an instruction that implements the functionality of the given library function, then what I said above will normally apply to that function as well (that would normally include functions like sin, cos, sqrt). For other functions, whose functionality is not immediately supported in the FP instruction set, the situation might prove to be significantly different. It is quite possible that float versions of such functions can be implemented more efficiently than their double versions.

Why are double preferred over float?

In my opinion the answers so far don't really get the right point across, so here's my crack at it.

The short answer is C++ developers use doubles over floats:

  • To avoid premature optimization when they don't understand the performance trade-offs well ("they have higher precision, why not?" Is the thought process)
  • Habit
  • Culture
  • To match library function signatures
  • To match simple-to-write floating point literals (you can write 0.0 instead of 0.0f)

It's true double may be as fast as a float for a single computation because most FPUs have a wider internal representation than either the 32-bit float or 64-bit double represent.

However that's only a small piece of the picture. Now-days operational optimizations don't mean anything if you're bottle necked on cache/memory bandwidth.

Here is why some developers seeking to optimize their code should look into using 32-bit floats over 64-bit doubles:

  • They fit in half the memory. Which is like having all your caches be twice as large. (big win!!!)
  • If you really care about performance you'll use SSE instructions. SSE instructions that operate on floating point values have different instructions for 32-bit and 64-bit floating point representations. The 32-bit versions can fit 4 values in the 128-bit register operands, but the 64-bit versions can only fit 2 values. In this scenario you can likely double your FLOPS by using floats over double because each instruction operates on twice as much data.

In general, there is a real lack of knowledge of how floating point numbers really work in the majority of developers I've encountered. So I'm not really surprised most developers blindly use double.

In what situations is it better to use a float over a double in Java?

Since your question is mostly about performance, this article presents you with some specific calculations (keep in mind though that this article is specific to neural networks, and your calculations may be completely different to what they're doing in the article): http://web.archive.org/web/20150310213841/http://www.heatonresearch.com/content/choosing-between-java%E2%80%99s-float-and-double

Some of the relevant material from the link is reproduced here:

Both double and float can support relatively large numbers. The upper
and lower range are really not a consideration for neural networks.
Float can handle numbers between 1.40129846432481707e-45 to
3.40282346638528860e+38...Basically, float can handle about 7 decimal places. A double can handle about 16 decimal places.

Matrix multiplication is one of the most common mathematical
operations for neural network programming. By no means is it the only
operation, but it will provide a good benchmark. The following program
will be used to benchmark a double.

Skipping all the code, the table on the website shows that for a 100x100 matrix multiplication, they have a gain in performance of around 10% if they use doubles. For a 500x100 matrix multiplication, the performance loss because of using doubles is around 7%. And for a 1000x1000 matrix multiplication, that loss is around 17%.

For the small 100x100 matrix switching to float may actually decrease
performance. As the size of the matrix increases, the percent gain
increases. With a very large matrix the performance gain increases to
17%. 17% is worth considering.

Is it more efficient to compare float than double?

It is typically either the same or slightly faster to compare a float to a float than a double to a double, but the difference is typically miniscule either way. The bigger performance problem you might be looking at is the conversion from float to double, which can either be free or pretty expensive, depending on where and how it is being done.

Either way, it's really not worth worrying too much about unless you have profiled and found a bottleneck in some huge loop involving this function, in which case you can then test the performance difference yourself and react accordingly.



Related Topics



Leave a reply



Submit