How Much Overhead Is There in Calling a Function in C++

Why is there overhead when calling functions?

It depends on your compiler settings and the way it optimizes code. Some functions are inlined. Others are not. It usually depends on whether you're optimizing for size or for speed.

Generally, calling function causes delay for two reasons:

The program needs to hook to some random location in memory where your function code starts. To do this, it needs to save the current cursor position into a stack so it knows where to return. This process consumes more than one CPU cycle.
Depending on your CPU architecture, there may be a pipeline, which fetches the next few instruction from memory into the CPU cache in parallel with your current instruction execution. This is to speed up execution speed. When you call a function, the cursor hooks to a completely different address and all the cached instructions are flushed from the pipeline. This causes further delays.

How to measure the call overhead of functions?

Looking at the disassembly of -O3 -march=native -std=c++11 with your code showed that the compiler was doing "too much" optimization by detecting the unnecessary re-affectation to the same unused variable. As suggested in comments, I used += instead of =. I also initialized result = 0 and main returns result instead of 0 to be make sure the compiler computes its value. This modified code gives:

noFunction, staticPolicy and memberPolicy is lowered as mulsd, addsd, addsd, i.e. scalar SSE instruction. Clang also doesn't vectorize (with vanilla options), but Intel's icc does (it generates vector and non vector versions and jumps depending on alignment and iteration count).
virtualMemberFunction and virtualMemberFunctionRaw result in a dynamic function call (no de-virtualization and inlining)

You can see for yourself by pasting your code here.

To answer your Q1 "pointer vs unique_ptr in debug build" : in -O0 calls are not inlined automatically, in particular unique_ptr::operator-> is called explicitly with no inlining, so that's 2 function call per iteration instead of 1 for regular pointers. This difference disappears for optimized builds

To answer your Q2 "is it possible to inline virtual calls" : in this example, gcc and clang don't inline the call, because they probably don't do enough static analysis. But you can help them. For instance, with clang 3.3 (but not 3.2 and not gcc) declaring the method as const and __attribute((pure)) does the job. In gcc (4.8, pre-4.9) I tried marking the method as final and compile with -fwhole-program but this didn't remove the call. So yes in this specific case it is possible to de-virtualize, but not reliably. In general, jitted compilers (C#, Java) are better at de-virtualizing because they can make better assumption from runtime information.

C++ call member method vs normal function overhead

Don't do premature optimization. Correctness is much more important than performance. The fastest code is worth nothing when it does not compile or produce wrong output.

Write code to be readable. Readable code is easier to test, debug and to refactor. Once you have working correct code you can measure to see where are the bottlenecks.

Anyhow, compilers are smart enough to see the pattern of loops similar to this one:

 int sum = 0;
 for (int i=0; i < 100; ++i) sum += i;

Knowing this, I changed your calculations to this (without this transformation I didn't get the desired output):

int main() {
        // final sum it's just a class member
    int finalSum = {0};

    for (int i = 0; i < 100; i+=2)
            finalSum += 2;
    for (int i = 1; i < 100; i+=2) 
            finalSum += 1;
    return finalSum;
}

Turned on optimizations (-O3) to get as a result:

main:
        mov     eax, 150
        ret

There is no need to write a loop and no need to call a function. Even though I did write two loops in the code, the compiler was smart enough to see that the final result can be determined at compile time. This is the kind of optimizations that make C++ code run fast. Human-driven optimizations must always be based on profiling and measurements, and it is really hard to beat a good compiler. Though in this case you could have noticed that the final result is just 150 and that there is no need for the loop.

Conclusion: Write code to be simple and readable. The code you posted effectively assigns 150 to finalSum and the most simple way to express this is finalSum = 150;.

You may argue that I missed the point of the question. But my point is that details matter. Different code will have different opportunities to make it simpler and more expressive. It is difficult/impossible to make a general statement about whether a function call introduces too much overhead. Anyhow the compiler is much better at making this decision. It may inline the call or not when it sees that it will be worth it.

What is the cost of a function call?

relative timings (shouldn't be off by more than a factor of 100 ;-)

memory-access in cache = 1
function call/return in cache = 2
memory-access out of cache = 10 .. 300
disk access = 1000 .. 1e8 (amortized depends upon the number of bytes transferred)
- depending mostly upon seek times
- the transfer itself can be pretty fast
- involves at least a few thousand ops, since the user/system threshold must be crossed at least twice; an I/O request must be scheduled, the result must be written back; possibly buffers are allocated...
network calls = 1000 .. 1e9 (amortized depends upon the number of bytes transferred)
- same argument as with disk i/o
- the raw transfer speed can be quite high, but some process on the other computer must do the actual work

have function calls a perceptible overhead in C++?

You really have to measure that it causes performance problems before worrying about it.

If there is a problem, try to do it with templates. Write the two variants of the functions, then use them as functors in your function template which does the iteration. You'll instantiate both versions, and call the appropriate one. The compiler should inline the calls (but better verify this).

I used this on medical image manipulation and it worked like a charm.