Profiling the C++ Compilation Process

Profiling the C++ compilation process

For GCC there are debugging options to find how much time is spent within each of the phases of C++ compilation?

-Q
Makes the compiler print out each function name as it is compiled, and print some statistics about each pass when it finishes.

-ftime-report
Makes the compiler print some statistics about the time consumed by each pass when it finishes.

Passes are described in GCCINT 9: Passes and Files of the Compiler.

You can post output of g++ compilation of single source file with -v -ftime-report here to discuss it. There could be some help on the GCC mailing list.

For compilers other than GCC (or GCC more ancient than 3.3.6) see the other options in this thread.

Profiling template metaprogram compilation time

I've been working since 2008 on a library that uses template metaprogramming heavily. There is a real need for better tools or approaches for understanding what consumes the most compile time.

The only technique I know of is a divide and conquer approach, either by separating code into different files, commenting out bodies of template definitions, or by wrapping your template instantiations in #define macros and temporarily redefining those macros to do nothing. Then you can recompile the project with and without various instantiations and narrow down.

Incidentally just separating the same code into more, smaller files may make it compile faster. I'm not just talking about opportunity for parallel compilation - even serially, I observed it to still be faster. I've observed this effect in gcc both when compiling my library, and when compiling Boost Spirit parsers. My theory is that some of the symbol resolution, overload resolution, SFINAE, or type inference code in gcc has an O(n log n) or even O(n^2) complexity with respect to the number of type definitions or symbols in play in the execution unit.

Ultimately what you need to do is carefully examine your templates and separate what really depends on the type information from what does not, and use type erasure and virtual functions whereever possible on the portion of the code that does not actually require the template types. You need to get stuff out of the headers and into cpp files if that part of the code can be moved. In a perfect world the compiler should be able to figure this out for itself - you shouldn't have to manually move this code to babysit it - but this is the state of the art with the compilers we have today.

How do I find out where the compiler spends its time?

TL;DR : Use MS vcperf to analyse your Build.

Note: This question is from 2013 and the tooling from 2019, so, yep, there was some wait here. :-)

MS has released vcperf which builds on their C++ Build Insights which is basically a toolchain allowing you to profile the compilation process.

https://devblogs.microsoft.com/cppblog/introducing-c-build-insights/

C++ Build Insights makes use of vcperf, a tool that allows you to capture a trace of your build and to view it in the Windows Performance Analyzer (WPA).

c++ g++ llvm-clang compiler profiling

Try these command line options with g++

-v -ftime-report

That should give you more information on the compiling process. The culprit is usually templates though.

How does gcc's -pg flag work?

Compiling with -pg instruments your code so that gprof reports detailed information, see gprof's manual, 9.1 Implementation of Profiling

Profiling works by changing how every function in your program is compiled so that when it is called, it will stash away some information about where it was called from. From this, the profiler can figure out what function called it, and can count how many times it was called. This change is made by the compiler when your program is compiled with the -pg option, which causes every function to call mcount (or _mcount, or __mcount, depending on the OS and compiler) as one of its first operations.
The mcount routine, included in the profiling library, is responsible for recording in an in-memory call graph table both its parent routine (the child) and its parent's parent. This is typically done by examining the stack frame to find both the address of the child, and the return address in the original parent. Since this is a very machine-dependent operation, mcount itself is typically a short assembly-language stub routine that extracts the required information, and then calls __mcount_internal (a normal C function) with two arguments—frompc and selfpc. __mcount_internal is responsible for maintaining the in-memory call graph, which records frompc, selfpc, and the number of times each of these call arcs was traversed.
...

Please note that with such an instrumenting profiler, you're profiling the same code you would compile in release without profiling instrumentation. There is an overhead associated with the instrumentation code itself. Also, the instrumentation code may alter instruction and data cache usage.

Contrary to an instrumenting profiler, a sampling profiler like Intel VTune works on non instrumented code by looking at the target program's program counter at regular intervals using operating system interrupts. It can also query special CPU registers to give you even more insight of what's going on.

See also Profilers Instrumenting Vs Sampling

Profiling the C++ Compilation Process