Why Is == Faster Than Eql

Is the inequality operator faster than the equality operator?

Usually, the microprocessor does comparison using electrical gates and not step by step like that. It checks all bits at once.

Why is === faster than == in PHP?

Because the equality operator == coerces, or converts, the data type temporarily to see if it’s equal to the other operand, whereas === (the identity operator) doesn’t need to do any converting whatsoever and thus less work is done, which makes it faster.

Is faster than =?

No, it will not be faster on most architectures. You didn't specify, but on x86, all of the integral comparisons will be typically implemented in two machine instructions:

  • A test or cmp instruction, which sets EFLAGS
  • And a Jcc (jump) instruction, depending on the comparison type (and code layout):
  • jne - Jump if not equal --> ZF = 0
  • jz - Jump if zero (equal) --> ZF = 1
  • jg - Jump if greater --> ZF = 0 and SF = OF
  • (etc...)

Example (Edited for brevity) Compiled with $ gcc -m32 -S -masm=intel test.c

    if (a < b) {
// Do something 1
}

Compiles to:

    mov     eax, DWORD PTR [esp+24]      ; a
cmp eax, DWORD PTR [esp+28] ; b
jge .L2 ; jump if a is >= b
; Do something 1
.L2:

And

    if (a <= b) {
// Do something 2
}

Compiles to:

    mov     eax, DWORD PTR [esp+24]      ; a
cmp eax, DWORD PTR [esp+28] ; b
jg .L5 ; jump if a is > b
; Do something 2
.L5:

So the only difference between the two is a jg versus a jge instruction. The two will take the same amount of time.


I'd like to address the comment that nothing indicates that the different jump instructions take the same amount of time. This one is a little tricky to answer, but here's what I can give: In the Intel Instruction Set Reference, they are all grouped together under one common instruction, Jcc (Jump if condition is met). The same grouping is made together under the Optimization Reference Manual, in Appendix C. Latency and Throughput.

Latency — The number of clock cycles that are required for the
execution core to complete the execution of all of the μops that form
an instruction.

Throughput — The number of clock cycles required to
wait before the issue ports are free to accept the same instruction
again. For many instructions, the throughput of an instruction can be
significantly less than its latency

The values for Jcc are:

      Latency   Throughput
Jcc N/A 0.5

with the following footnote on Jcc:


  1. Selection of conditional jump instructions should be based on the recommendation of section Section 3.4.1, “Branch Prediction Optimization,” to improve the predictability of branches. When branches are predicted successfully, the latency of jcc is effectively zero.

So, nothing in the Intel docs ever treats one Jcc instruction any differently from the others.

If one thinks about the actual circuitry used to implement the instructions, one can assume that there would be simple AND/OR gates on the different bits in EFLAGS, to determine whether the conditions are met. There is then, no reason that an instruction testing two bits should take any more or less time than one testing only one (Ignoring gate propagation delay, which is much less than the clock period.)


Edit: Floating Point

This holds true for x87 floating point as well: (Pretty much same code as above, but with double instead of int.)

        fld     QWORD PTR [esp+32]
fld QWORD PTR [esp+40]
fucomip st, st(1) ; Compare ST(0) and ST(1), and set CF, PF, ZF in EFLAGS
fstp st(0)
seta al ; Set al if above (CF=0 and ZF=0).
test al, al
je .L2
; Do something 1
.L2:

fld QWORD PTR [esp+32]
fld QWORD PTR [esp+40]
fucomip st, st(1) ; (same thing as above)
fstp st(0)
setae al ; Set al if above or equal (CF=0).
test al, al
je .L5
; Do something 2
.L5:
leave
ret

C - 'Greater than' vs 'Greater than or equal to' performance

As is mentioned by others, and here:, most modern processors already have instructions that handle the more complex looking cases you're mentioning.

I'm not sure how far back 'modern' is here, but I would say that if you are worried about performance, this is one place you shouldn't try to optimize for speed; rather for clarity. An optimizing compiler will typically know a faster way of handling operations than you do.

Which operator is faster ( or =), ( or =)?

it varies, first start at examining different instruction sets and how how the compilers use those instruction sets. Take the openrisc 32 for example, which is clearly mips inspired but does conditionals differently. For the or32 there are compare and set flag instructions, compare these two registers if less than or equal unsigned then set the flag, compare these two registers if equal set the flag. Then there are two conditional branch instructions branch on flag set and branch on flag clear. The compiler has to follow one of these paths, but less, than, less than or equal, greater than, etc are all going to use the same number of instructions, same execution time for a conditional branch and same execution time for not doing the conditional branch.

Now it is definitely going to be true for most architectures that performing the branch takes longer than not performing the branch because of having to flush and re-fill the pipe. Some do branch prediction, etc to help with that problem.

Now some architectures the size of the instruction may vary, compare gpr0 and gpr1 vs compare gpr0 and the immediate number 1234, may require a larger instruction, you will see this a lot with x86 for example. so although both cases may be a branch if less than how you encode the less depending on what registers happen to hold what values can make a performance difference (sure x86 does a lot of pipelining, lots of caching, etc to make up for these issues). Another similar example is mips and or32, where r0 is always a zero, it is not really a general purpose register, if you write to it it doesnt change, it is hardwired to a zero, so a compare if equal to 0 MIGHT cost you more than a compare if equal to some other number if an extra instruction or two is required to fill a gpr with that immediate so that the compare can happen, worst case is having to evict a register to the stack or memory, to free up the register to put the immediate in there so that the compare can happen.

Some architectures have conditional execution like arm, for the full arm (not thumb) instructions you can on a per instruction basis execute, so if you had code

if(i==7) j=5; else j=9;

the pseudo code for arm would be

cmp i,#7
moveq j,#5
movne j,#7

there is no actual branch, so no pipeline issues you flywheel right on through, very fast.

One architecture to another if that is an interesting comparison some as mentioned, mips, or32, you have to specifically perform some sort of instruction for the comparision, others like x86, msp430 and the vast majority each alu operation changes the flags, arm and the like change flags if you tell it to change flags otherwise dont as shown above. so a

while(--len)
{
//do something
}

loop the subtract of 1 also sets the flags, if the stuff in the loop was simple enough you could make the whole thing conditional, so you save on separate compare and branch instructions and you save in the pipeline penalty. Mips solves this a little by compare and branch are one instruction, and they execute one instruction after the branch to save a little in the pipe.

The general answer is that you will not see a difference, the number of instructions, execuition time, etc are the same for the various conditionals. special cases like small immediates vs big immediates, etc may have an effect for corner cases, or the compiler may simply choose to do it all differently depending on what comparison you do. If you try to re-write your algorithm to have it give the same answer but use a less than instead of a greater than and equal, you could be changing the code enough to get a different instruction stream. Likewise if you perform too simple of a performance test, the compiler can/will optimize out the comparison complete and just generate the results, which could vary depending on your test code causing different execution. The key to all of this is disassemble the things you want to compare and see how the instructions differ. That will tell you if you should expect to see any execution differences.

Is == operator faster than equals()?

TL;DR Don't use == to compare strings.

Specifically with regard to strings, yes, == is slightly faster than equals, because the first thing the String.equals method does is...a == comparison to see if the string is being compared to itself. If it is, equals() is slower by the cost of a method call. If it isn't, equals is slower by that cost plus the cost of comparing the characters in the strings.

But remember that before you can use == to compare strings (which is a Bad Idea™), you have to know for sure that both strings are interned. The combination of intern() and == is not faster than equals(). intern() is a relatively expensive operation, as it involves looking for an equivalent string already in the intern pool, which may involve lots and lots of equals() calls (or their equivalent).

There may be some extremely rare edge cases where it's reasonable to incur that intern() cost and then use == on strings you know are interned. For instance, if you have a large static set of strings that you compare to one another really frequently. But that's an extremely unusual edge situation.

Bottom line: Don't compare strings with ==. Don't intern strings unnecessarily.



Related Topics



Leave a reply



Submit