How Does Switch Compile in Visual C++ and How Optimized and Fast Is It

Optimize switch(x), x is a constant number passed as a parameter

In your demo the call f(0); in main is optimized out as you can see from the assembly for main:

main:
mov r0, #0
bx lr

The code for f(int) looks pretty optimal already to me, I think it would be less optimal to call a function instead of just issuing one assembly instruction.

Is 'switch' faster than 'if'?

There are several optimizations a compiler can make on a switch. I don't think the oft-mentioned "jump-table" is a very useful one though, as it only works when the input can be bounded some way.

C Pseudocode for a "jump table" would be something like this -- note that the compiler in practice would need to insert some form of if test around the table to ensure that the input was valid in the table. Note also that it only works in the specific case that the input is a run of consecutive numbers.

If the number of branches in a switch is extremely large, a compiler can do things like using binary search on the values of the switch, which (in my mind) would be a much more useful optimization, as it does significantly increase performance in some scenarios, is as general as a switch is, and does not result in greater generated code size. But to see that, your test code would need a LOT more branches to see any difference.

To answer your specific questions:

  1. Clang generates one that looks like this:

    test_switch(char):                       # @test_switch(char)
    movl %edi, %eax
    cmpl $19, %edi
    jbe .LBB0_1
    retq
    .LBB0_1:
    jmpq *.LJTI0_0(,%rax,8)
    jmp void call<0u>() # TAILCALL
    jmp void call<1u>() # TAILCALL
    jmp void call<2u>() # TAILCALL
    jmp void call<3u>() # TAILCALL
    jmp void call<4u>() # TAILCALL
    jmp void call<5u>() # TAILCALL
    jmp void call<6u>() # TAILCALL
    jmp void call<7u>() # TAILCALL
    jmp void call<8u>() # TAILCALL
    jmp void call<9u>() # TAILCALL
    jmp void call<10u>() # TAILCALL
    jmp void call<11u>() # TAILCALL
    jmp void call<12u>() # TAILCALL
    jmp void call<13u>() # TAILCALL
    jmp void call<14u>() # TAILCALL
    jmp void call<15u>() # TAILCALL
    jmp void call<16u>() # TAILCALL
    jmp void call<17u>() # TAILCALL
    jmp void call<18u>() # TAILCALL
    jmp void call<19u>() # TAILCALL
    .LJTI0_0:
    .quad .LBB0_2
    .quad .LBB0_3
    .quad .LBB0_4
    .quad .LBB0_5
    .quad .LBB0_6
    .quad .LBB0_7
    .quad .LBB0_8
    .quad .LBB0_9
    .quad .LBB0_10
    .quad .LBB0_11
    .quad .LBB0_12
    .quad .LBB0_13
    .quad .LBB0_14
    .quad .LBB0_15
    .quad .LBB0_16
    .quad .LBB0_17
    .quad .LBB0_18
    .quad .LBB0_19
    .quad .LBB0_20
    .quad .LBB0_21
  2. I can say that it is not using a jump table -- 4 comparison instructions are clearly visible:

    13FE81C51 cmp  qword ptr [rsp+30h],1 
    13FE81C57 je testSwitch+73h (13FE81C73h)
    13FE81C59 cmp qword ptr [rsp+30h],2
    13FE81C5F je testSwitch+87h (13FE81C87h)
    13FE81C61 cmp qword ptr [rsp+30h],3
    13FE81C67 je testSwitch+9Bh (13FE81C9Bh)
    13FE81C69 cmp qword ptr [rsp+30h],4
    13FE81C6F je testSwitch+0AFh (13FE81CAFh)

    A jump table based solution does not use comparison at all.

  3. Either not enough branches to cause the compiler to generate a jump table, or your compiler simply doesn't generate them. I'm not sure which.

EDIT 2014: There has been some discussion elsewhere from people familiar with the LLVM optimizer saying that the jump table optimization can be important in many scenarios; e.g. in cases where there is an enumeration with many values and many cases against values in said enumeration. That said, I stand by what I said above in 2011 -- too often I see people thinking "if I make it a switch, it'll be the same time no matter how many cases I have" -- and that's completely false. Even with a jump table you get the indirect jump cost and you pay for entries in the table for each case; and memory bandwidth is a Big Deal on modern hardware.

Write code for readability. Any compiler worth its salt is going to see an if / else if ladder and transform it into equivalent switch or vice versa if it would be faster to do so.

Advantage of switch over if-else statement

Use switch.

In the worst case the compiler will generate the same code as a if-else chain, so you don't lose anything. If in doubt put the most common cases first into the switch statement.

In the best case the optimizer may find a better way to generate the code. Common things a compiler does is to build a binary decision tree (saves compares and jumps in the average case) or simply build a jump-table (works without compares at all).

How would you make this switch statement as fast as possible?

Assuming every input will be a valid ActivCode, that you can change the enumeration values and highly coupled to the GetHashCode implementation:

enum MarketDataExchange
{
NONE,
NBBO = 371857150,
AMEX = 372029405,
BSE = 372029408,
BATS = -1850320644,
NSE = 372029407,
CHX = -284236702,
NYSE = 372029412,
ARCA = -734575383,
NASDAQ = 372029421,
NASDAQ_ADF = -1137859911,
CBOE = 372029419,
PHLX = 372029430,
DIRECTEDGE = 372029429
}

public static MarketDataExchange GetMarketDataExchange(string ActivCode)
{
if (ActivCode == null) return MarketDataExchange.NONE;

return (MarketDataExchange)ActivCode.GetHashCode();
}

How compiler (Visual C++) optimize accessing vector elements by index?

You'we been told in the comments that comparing the performance of the debug builds is not really useful. That statement is correct, and the issue here is not that release builds do something magical to access your vectors, but that debug builds are slowed down by a lot of verification and instrumentation code that does not get into the release builds.

Consider:
Depending on the stdc++ implementation, T operator[](size_t index) const can contain range checking code in debug build that you simply step around when you directly use pointers (IIRC, that how things work in Visual Studio stdc++ implementation).

Naturally, your debug build using pointer accesses will be faster, but you have already found many of the pitfalls.

Summa summarum: using indices in your case is perfectly fine. They won't hurt performance. In scenarios like these it's typically not the indexing that hurts, but how coherent those memory accesses are, are you trashing your CPU cache, etc.

What's compiler thinking about the switch-statement?

None of them should compile. The C# specification requires that a switch section have at least one statement. The parser should disallow it.

Let's ignore the fact that the parser allows an empty statement list; that's not what's relevant. The specification says that the end of the switch section must not have a reachable end point; that's the relevant bit.

In your last example, the switch section has a reachable end point:

void M(int x) { switch(2) { case 2: ; } }

so it must be an error.

If you had:

void M(int x) { switch(x) { case 2: ; } }

then the compiler does not know if x will ever be 2. It assumes conservatively that it could, and says that the section has a reachable end point, because the switch case label is reachable.

If you had

void M(int x) { switch(1) { case 2: ; } }

Then the compiler can reason that the endpoint is not reachable because the case label is not reachable. The compiler knows that the constant 1 is never equal to the constant 2.

If you had:

void M(int x) { switch(x = 1) { case 2: ; } }

or

void M(int x) { x = 1; switch(x) { case 2: ; } }

Then you know and I know that the end point is not reachable, but the compiler does not know that. The rule in the specification is that reachability is only determined by analyzing constant expressions. Any expression which contains a variable, even if you know its value by some other means, is not a constant expression.

In the past the C# compiler had bugs where this was not the case. You could say things like:

void M(int x) { switch(x * 0) { case 2: ; } }

and the compiler would reason that x * 0 had to be 0, therefore the case label is not reachable. That was a bug, which I fixed in C# 3.0. The specification says that only constants are used for that analysis, and x is a variable, not a constant.

Now, if the program is legal then the compiler can use advanced techniques like this to influence what code is generated. If you say something like:

void M(int x) { if (x * 0 == 0) Y(); }

Then the compiler can generate the code as though you'd written

void M(int x) { Y(); }

if it wants. But it cannot use the fact that x * 0 == 0 is true for the purposes of determining statement reachability.

Finally, if you have

void M(int x) { if (false) switch(x) { case 2: ; } }

then we know that the switch is not reachable, therefore the block does not have a reachable end point, so this is, surprisingly, legal. But given the discussion above, you now know that

void M(int x) { if (x * 0 != 0) switch(x) { case 2: ; } }

does not treat x * 0 != 0 as false, so the end point is considered reachable.

What does Optimize Code option really do in Visual Studio?

Without optimizations the compiler produces very dumb code - each command is compiled in a very straightforward manner, so that it does the intended thing. The Debug builds have optimizations disabled by default, because without the optimizations the produced executable matches the source code in a straightforward manner.

Variables kept in registers

Once you turn on the optimizations, the compiler applies many different techniques to make the code run faster while still doing the same thing. The most obvious difference between optimized and unoptimized builds in Visual C++ is the fact the variable values are kept in registers as long as possible in optimized builds, while without optimizations they are always stored into the memory. This affects not only the code speed, but it also affects debugging. As a result of this optimization the debugger cannot reliably obtain a variable value as you are stepping through the code.

Other optimizations

There are multiple other optimizations applied by the compiler, as described in /O Options (Optimize Code) MSDN docs. For a general description of various optimizations techniques see Wikipedia Compiler Optimization article.



Related Topics



Leave a reply



Submit