How to Speed Up Floating-Point to Integer Number Conversion

How to speed up floating-point to integer number conversion?

Most of the other answers here just try to eliminate loop overhead.

Only deft_code's answer gets to the heart of what is likely the real problem -- that converting floating point to integers is shockingly expensive on an x86 processor. deft_code's solution is correct, though he gives no citation or explanation.

Here is the source of the trick, with some explanation and also versions specific to whether you want to round up, down, or toward zero: Know your FPU

Sorry to provide a link, but really anything written here, short of reproducing that excellent article, is not going to make things clear.

What is the fastest way to convert float to int on x86

It depends on if you want a truncating conversion or a rounding one and at what precision. By default, C will perform a truncating conversion when you go from float to int. There are FPU instructions that do it but it's not an ANSI C conversion and there are significant caveats to using it (such as knowing the FPU rounding state). Since the answer to your problem is quite complex and depends on some variables you haven't expressed, I recommend this article on the issue:

http://www.stereopsis.com/FPU.html

Faster float to int conversion in Python

Did you try the obvious one?

def floatToInt(x):
return int((x+1.0) * (2**31))

floating point conversions and performance

I suspect the answer to this question is going to vary by target architecture, because the conversions can (but might not) to occur in hardware. For example, consider the following code, which causes some interconversions between int and float:

int main (int argc, char** argv)
{
int precoarced = 35;
// precoarced gets forced to float
float result = 0.5 + precoarced;

// and now we force it back to int
return (int)result;

// I wonder what the disassembly looks like in different environments?
}

When I tried to compile this with g++ (I'm on Ubuntu, x86) with default settings, and used gdb to disassemble:

   0x00000000004004b4 <+0>: push   %rbp
0x00000000004004b5 <+1>: mov %rsp,%rbp
0x00000000004004b8 <+4>: mov %edi,-0x14(%rbp)
0x00000000004004bb <+7>: mov %rsi,-0x20(%rbp)
0x00000000004004bf <+11>: movl $0x23,-0x8(%rbp)
0x00000000004004c6 <+18>: cvtsi2sdl -0x8(%rbp),%xmm0
0x00000000004004cb <+23>: movsd 0x10d(%rip),%xmm1 # 0x4005e0
0x00000000004004d3 <+31>: addsd %xmm1,%xmm0
0x00000000004004d7 <+35>: unpcklpd %xmm0,%xmm0
0x00000000004004db <+39>: cvtpd2ps %xmm0,%xmm0
0x00000000004004df <+43>: movss %xmm0,-0x4(%rbp)
0x00000000004004e4 <+48>: movss -0x4(%rbp),%xmm0
0x00000000004004e9 <+53>: cvttss2si %xmm0,%eax
0x00000000004004ed <+57>: pop %rbp
0x00000000004004ee <+58>: retq

Note the instructions with a cvt-prefixed mnemonic. These are conversion instructions. So in this case, the conversion is happening in hardware in a handful of instructions. So, depending on how many cycles these instructions cost, it could be reasonably fast. But again, a different architecture (or different compiler) could change the story.

Edit: On a fun side note, there's an extra conversion in there due to me accidentally specifying 0.5 instead of 0.5f. That's why the cvtpd2ps op is in there.

Edit: x86 has had FP support for a long time (since the 80s), so C++ compilers targeting x86 will generally make use of the hardware (unless the compiler is seriously behind the times). Thanks Hot Licks for pointing this out.

Fastest way to cast a float to an int in javascript?

Great Question! I actually had to deal with this the other day! It may seem like a goto to just write parseInt but wait! we can be fancier.

So we can use bit operators for quite a few things and this seems like a great situation! Let's say I have the number from your question, 12.345, I can use the bit operator '~' which inverts all the bits in your number and in the process converts the number to an int! Gotta love JS.

So now we have the inverted bit representation of our number then if we '~' it again we get ........drum roll......... our number without the decimals! Unfortunately, it doesn't do rounding.

var a = 12.345;
var b = ~~a; //boom!

We can use Math.round() for that. But there you go! You can try it on JSperf to see the slight speed up you get! Hope that helps!

A fast method to round a double to a 32-bit int explained

A value of the double floating-point type is represented like so:

double representation

and it can be seen as two 32-bit integers; now, the int taken in all the versions of your code (supposing it’s a 32-bit int) is the one on the right in the figure, so what you are doing in the end is just taking the lowest 32 bits of mantissa.


Now, to the magic number; as you correctly stated, 6755399441055744 is 251 + 252; adding such a number forces the double to go into the “sweet range” between 252 and 253, which, as explained by Wikipedia, has an interesting property:

Between 252 = 4,503,599,627,370,496 and 253 = 9,007,199,254,740,992, the representable numbers are exactly the integers.

This follows from the fact that the mantissa is 52 bits wide.

The other interesting fact about adding 251 + 252 is that it affects the mantissa only in the two highest bits—which are discarded anyway, since we are taking only its lowest 32 bits.


Last but not least: the sign.

IEEE 754 floating point uses a magnitude and sign representation, while integers on “normal” machines use 2’s complement arithmetic; how is this handled here?

We talked only about positive integers; now suppose we are dealing with a negative number in the range representable by a 32-bit int, so less (in absolute value) than (−231 + 1); call it −a. Such a number is obviously made positive by adding the magic number, and the resulting value is 252 + 251 + (−a).

Now, what do we get if we interpret the mantissa in 2’s complement representation? It must be the result of 2’s complement sum of (252 + 251) and (−a). Again, the first term affects only the upper two bits, what remains in the bits 0–50 is the 2’s complement representation of (−a) (again, minus the upper two bits).

Since reduction of a 2’s complement number to a smaller width is done just by cutting away the extra bits on the left, taking the lower 32 bits gives us correctly (−a) in 32-bit, 2’s complement arithmetic.

Safest way to convert float to integer in python?

All integers that can be represented by floating point numbers have an exact representation. So you can safely use int on the result. Inexact representations occur only if you are trying to represent a rational number with a denominator that is not a power of two.

That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.

To quote from Wikipedia,

Any integer with absolute value less than or equal to 224 can be exactly represented in the single precision format, and any integer with absolute value less than or equal to 253 can be exactly represented in the double precision format.

How expensive is it to convert between int and double?

Here's what I could dig up myself, for x86-64 doing FP math with SSE2 (not legacy x87 where changing the rounding mode for C++'s truncation semantics was expensive):

  1. When I take a look at the generated assembly from clang and gcc, it looks like the cast int to double, it boils down to one instruction: cvttsd2si.

    From double to int it's cvtsi2sd. (cvtsi2sdl AT&T syntax for cvtsi2sd with 32-bit operand-size.)

    With auto-vectorization, we get cvtdq2pd.

    So I suppose the question becomes: what is the cost of those?

  2. These instructions each cost approximately the same as an FP addsd plus a movq xmm, r64 (fp <- integer) or movq r64, xmm (integer <- fp), because they decode to 2 uops which on the same ports, on mainstream (Sandybridge/Haswell/Sklake) Intel CPUs.

    The Intel® 64 and IA-32 Architectures Optimization Reference Manual says that cost of the cvttsd2si instruction is 5 latency (see Appendix C-16). cvtsi2sd, depending on your architecture, has latency varying from 1 on Silvermont to more like 7-16 on several other architectures.

    Agner Fog's instruction tables have more accurate/sensible numbers, like 5-cycle latency for cvtsi2sd on Silvermont (with 1 per 2 clock throughput), or 4c latency on Haswell, with one per clock throughput (if you avoid the dependency on the destination register from merging with the old upper half, like gcc usually does with pxor xmm0,xmm0).

    SIMD packed-float to packed-int is great; single uop. But converting to double requires a shuffle to change element size. SIMD float/double<->int64_t doesn't exist until AVX512, but can be done manually with limited range.

    Intel's manual defines latency as: "The number of clock cycles that are required for the execution core to complete the execution of all of the μops that form an instruction." But a more useful definition is the number of clocks from an input being ready until the output becomes ready. Throughput is more important than latency if there's enough parallelism for out-of-order execution to do its job: What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?.

  3. The same Intel manual says that an integer add instruction costs 1 latency and an integer imul costs 3 (Appendix C-27). FP addsd and mulsd run at 2 per clock throughput, with 4 cycle latency, on Skylake. Same for the SIMD versions, and for FMA, with 128 or 256-bit vectors.

    On Haswell, addsd / addpd is only 1 per clock throughput, but 3 cycle latency thanks to a dedicated FP-add unit.

So, the answer boils down to:

1) It's hardware optimized, and the compiler leverages the hardware machinery.

2) It costs only a bit more than a multiply does in terms of the # of cycles in one direction, and a highly variable amount in the other (depending on your architecture). Its cost is neither free nor absurd, but probably warrants more attention given how easy it is write code that incurs the cost in a non-obvious way.



Related Topics



Leave a reply



Submit