Gcc Optimization Flag -O3 Makes Code Slower Than -O2

With the -Ofast flag on gcc, does breaking down a math expression affect speed?

Intuitively, I'd expect these to compile to the same code. But let's see what actually happens! Using godbolt with your first version (the one-liner), we get this code:

    mov     eax, DWORD PTR [rsp+20]
mov esi, DWORD PTR [rsp+28]
imul esi, DWORD PTR [rsp+32]
imul eax, DWORD PTR [rsp+24]
lea eax, [rax+rsi]
mov esi, DWORD PTR [rsp+36]
imul esi, DWORD PTR [rsp+40]
add esi, eax
add esi, DWORD PTR [rsp+44]
mov DWORD PTR [rsp+44], esi

With the second version, we get this:

    mov     esi, DWORD PTR [rsp+28]
imul esi, DWORD PTR [rsp+32]
mov eax, DWORD PTR [rsp+20]
imul eax, DWORD PTR [rsp+24]
add eax, DWORD PTR [rsp+44]
lea eax, [rax+rsi]
mov esi, DWORD PTR [rsp+36]
imul esi, DWORD PTR [rsp+40]
add esi, eax
mov DWORD PTR [rsp+44], esi]

These are, I believe, the same instructions in a slightly different order. I suspect the performance would be almost identical in these two cases, though perhaps (?) there would be a slight difference in pipeline performance with one versus the other.

I suspect that your first version is perfectly fine here.

When can optimizations done by the compiler destroy my C++ code?

Compiler optimizations should not affect the observable behavior of your program, so in theory, you don't need to worry. In practice, if your program strays in to undefined behavior, anything could already happen, so if your program breaks when you enable optimizations, you've merely exposed existing bugs - it wasn't optimization that broke it.

One common optimization point is the return value optimisation (RVO) and named return value optimization (NRVO) which basically means objects returned by value from functions get constructed directly in the object which is receiving them, rather than making a copy. This adjusts the order and number of constructor, copy constructor and destructor calls - but usually with those functions correctly written, there's still no observable difference in the behavior.

GCC compiler optimization options

According to the documentation,

Most optimizations are only enabled if an -O level is set on the command line. Otherwise they are disabled, even if individual optimization flags are specified.

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html (emphasis mine)


When it continues to list the specific flags you can set, it notes:

You can use the following flags in the rare cases when “fine-tuning” of optimizations to be performed is desired.

Which makes it pretty clear that they don't expect you to make an entirely custom set of optimisation options, but rather choose the general level which best fits your scenario (which will vary depending on whether you are currently debugging, shipping, etc.), then fine-tune any flags which your code benefits from having on or off.

swiftc compile time is more slow when using -O than not using

The times you are showing are compilation times, not execution times.
Optimizations take time and the compiler has to work harder to complete them, it's completely normal that compilation takes longer when optimizing the code.

This is in general the intended behaviour, one small disadvantage is the larger executable size that can be produced, but that's generally not an issue

Will an operation done several times in sequence be simplified by compiler?

Which optimizations are done depends on the compiler, the compiler optimization flag(s) you specify, and the architecture.

Here are a few possible optimizations for your example:

  • Loop Unrolling This makes the binary larger and thus is a trade-off; for example you may not want this on a tiny microprocessor with very little memory.
  • Common Subexpression Elimination (CSE) you can be pretty sure that your (i % 3) * 10 will only be executed once per loop iteration.

About your concern about visual clarity vs. optimization: When dealing with a 'local situation' like yours, you should focus on code clarity.

Optimization gains are often to be made at a higher level; for example in the algorithm you use.

There's a lot to be said about optimization; the above are just a few opening remarks. It's great that you're interested in how things work, because this is important for a good (C/C++) programmer.

greenhills compiler turn off optimization for file or part of

From the manual:

#pragma ghs Ostring
Turns on optimizations. The optional string may contain any or all of the following letters:
L — Loop optimizations
M — Memory optimizations
S — Small (but Slow) optimizations

#pragma ghs ZO
Disables all optimizations, starting from the next function.


Related Topics



Leave a reply



Submit