Is there a reason why not to use link-time optimization (LTO)?
I assume that by "production software" you mean software that you ship to the customers / goes into production. The answers at Why not always use compiler optimization? (kindly pointed out by Mankarse) mostly apply to situations in which you want to debug your code (so the software is still in the development phase -- not in production).
6 years have passed since I wrote this answer, and an update is necessary. Back in 2014, the issues were:
- Link time optimization occasionally introduced subtle bugs, see for example Link-time optimization for the kernel. I assume this is less of an issue as of 2020. Safeguard against these kinds of compiler and linker bugs: Have appropriate tests to check the correctness of your software that you are about to ship.
- Increased compile time. There are claims that the situation has significantly improved since 2014, for example thanks to slim objects.
- Large memory usage. This post claims that the situation has drastically improved in recent years, thanks to partitioning.
As of 2020, I would try to use LTO by default on any of my projects.
Link-time optimization (lto) for mixed C++/C programs
This should not be a problem at all. In both GCC and Clang link-time optimization operates on intermediate representation of the code. That is, by using -flto
you create object files with additional LTO information (gcc) or LLVM bytecode (Clang), at which point the source language stops to matter.
Some go as far as to mix even less related C++ and D yet still use LTO: http://johanengelen.github.io/ldc/2016/11/10/Link-Time-Optimization-LDC.html
link-time optimization versus. project inlining; limitations on each approach
The technical name, approaching minor buzzword status, for that approach is unity build.
See for example:
The benefits / disadvantages of unity builds?
The downside is best described here:
http://leewinder.co.uk/blog/?p=394
The short version is it is more or less a choice of languages: you either write in regular-C++ or Unified-build-C++. The 'correct' way of writing virtually any code will differ between the two.
Using GCC's link-time optimization with static linked libraries
Here is an MCVE CMake project that reproduces the problem:
$ ls -R hellow
hellow:
CMakeLists.txt hello.c libhello.c
$ cat hellow/CMakeLists.txt
cmake_minimum_required (VERSION 2.6)
project (hellow)
SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -flto")
SET(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -flto")
#SET(CMAKE_AR "gcc-ar")
#SET(CMAKE_C_ARCHIVE_CREATE "<CMAKE_AR> qcs <TARGET> <LINK_FLAGS> <OBJECTS>")
#SET(CMAKE_C_ARCHIVE_FINISH true)
add_library(hello STATIC libhello.c)
add_executable(hellow hello.c)
target_link_libraries(hellow hello)
add_dependencies(hellow hello)
$ cat hellow/hello.c
extern void hello(void);
int main(void)
{
hello();
return 0;
}
$ cat hellow/libhello.c
#include <stdio.h>
void hello(void)
{
puts("Hello");
}
Configuration is good:
$ mkdir build_hellow
$ cd build_hellow/
$ cmake ../hellow
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/imk/dev/so/build_hellow
Build fails as per problem:
$ make
Scanning dependencies of target hello
[ 25%] Building C object CMakeFiles/hello.dir/libhello.c.o
[ 50%] Linking C static library libhello.a
/usr/bin/ar: CMakeFiles/hello.dir/libhello.c.o: plugin needed to handle lto object
/usr/bin/ranlib: libhello.c.o: plugin needed to handle lto object
[ 50%] Built target hello
Scanning dependencies of target hellow
[ 75%] Building C object CMakeFiles/hellow.dir/hello.c.o
[100%] Linking C executable hellow
/tmp/ccV0lG36.ltrans0.ltrans.o: In function `main':
<artificial>:(.text+0x5): undefined reference to `hello'
collect2: error: ld returned 1 exit status
CMakeFiles/hellow.dir/build.make:95: recipe for target 'hellow' failed
make[2]: *** [hellow] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/hellow.dir/all' failed
make[1]: *** [CMakeFiles/hellow.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
There is more than one solution. One is to uncomment the 3 commented lines
in CMakeLists.txt
above. Then:
$ cmake ../hellow/
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/imk/dev/so/build_hellow
$ make
Scanning dependencies of target hello
[ 25%] Building C object CMakeFiles/hello.dir/libhello.c.o
[ 50%] Linking C static library libhello.a
[ 50%] Built target hello
Scanning dependencies of target hellow
[ 75%] Building C object CMakeFiles/hellow.dir/hello.c.o
[100%] Linking C executable hellow
[100%] Built target hellow
$ ./hellow
Hello
This fix makes use of the following facts.
The build-breaking problem:
/usr/bin/ar: CMakeFiles/hello.dir/libhello.c.o: plugin needed to handle lto object
...
/usr/bin/ranlib: libhello.c.o: plugin needed to handle lto object
can solved by giving ar
and ranlib
the option:
--plugin=$(gcc --print-file-name=liblto_plugin.so)
However, GNU ranlib
is merely a synonym for ar -s
, and gcc-ar
is a
wrapper for ar
that supplies that plugin.
CMake's build template for a C static library is:
CMAKE_C_ARCHIVE_CREATE ( = <CMAKE_AR> qc <TARGET> <LINK_FLAGS> <OBJECTS>)
CMAKE_C_ARCHIVE_FINISH ( = <CMAKE_RANLIB> <TARGET>)
which for GNU ar
is equivalent to:
CMAKE_C_ARCHIVE_CREATE ( = <CMAKE_AR> qcs <TARGET> <LINK_FLAGS> <OBJECTS>)
CMAKE_C_ARCHIVE_FINISH ( = true) # Or any other no-op command
So with these settings plus:
SET(CMAKE_AR "gcc-ar")
we're good.
For a C++ project, of course, set CMAKE_CXX_ARCHIVE_CREATE
and CMAKE_CXX_ARCHIVE_FINISH
Does LTO works when compiling with GCC but linking with LLVM LLD?
I did some research and finally concluded for myself that no LTO is done if we use lld
when compiling with gcc
. What I did:
Based on this somewhat vague presentation: https://www.slideshare.net/chimerawang/gcc-lto, I found that the linker is not directly doing the optimization, but rather, after reading all the symbols from all the object files, he passes the info to the lto-wrapper
who then does the optimization through some other processes. So I made a test using a hello-world
cpp file, compiling it with the -v
flag and indeed I saw the succession of calls as earlier mentioned (collect2
(linker) -> lto-wrapper
-> lto1
). But this when using the default linker or the gold
linker. When I used the -fuse-ld=lld
flag, only the collect2
process was called. And this first thing made me believe that LTO was not done at all.
But hey, maybe the lld
linker internalized the LTO process so it is done without calling any other process. So I made another test to see if LTO is done (based on this article). Basically from one cpp file I call for 100 000 000 times a function that's defined in other cpp file, a function which does nothing. Using basic -O2
optimization, the resulted binary runs in ~200ms, as the compiler is not able to optimize out the useless function calls. When using also the -flto
flag and either ld
or gold
linker, the resulted binary runs in ~2 ms. But when using the lld
linker, the resulted binary also runs in ~200ms. So lld
with lto runs as slow as lld
without lto. No sign of optimization whatsoever.
To be mentioned here that using the lld
linker, the link command would fail if the objects would not be compiled using -ffat-lto-objects
. This flag makes the object files larger because the compiler dumps not only the lto code, but also the code that can be linked without lto.
So, considering the time performance of the binary linked with lld
and also the fact that objects need to be compiled with -ffat-lto-objects
, I concluded that when the lld
linker is used, LTO is not achieved at all, but lld
uses the non-LTO code generated by the compiler in order to link the binary.
Inline functions and link time optimizations
From C11 6.7.4p7 emphasis mine:
Any function with internal linkage can be an inline function. For a
function with external linkage, the following restrictions apply: If a
function is declared with an inline function specifier, then it shall
also be defined in the same translation unit. If all of the file scope
declarations for a function in a translation unit include the inline
function specifier without extern, then the definition in that
translation unit is an inline definition. An inline definition does
not provide an external definition for the function, and does not
forbid an external definition in another translation unit. An inline
definition provides an alternative to an external definition, which a
translator may use to implement any call to the function in the same
translation unit. It is unspecified whether a call to the function
uses the inline definition or the external definition
Your code is does not provide a definition of foo
with external linkage and compiler is right - there is no foo
. The foo
in test1.c
is an inline function, it doesn't provide a function with external definition.
Did I compile it correctly?
Well, yes.
Is gcc failing to inline?
The compilation failed, so yes.
The keyword inline
may serve as a hint for the compiler to maybe inline the function. There is no requirement that compiler will do that. inline
is a misleading keyword - it serves primarily to modify linkage of objects, so to let the compiler choose between inline and non-inline versions of the same function available within the same transaction unit. It does not mean that the function will be inlined.
If you are using LTO, just drop the inline, there is no point in hinting the compiler - trust gcc
it will do a better job optimizing then you will, with LTO it "sees" all the functions in single transaction unit anyway. Also read gcc docs on inline and remember about the rules of optimization.
Related Topics
How to Use Makefiles in Visual Studio
How to Use Cmake_Export_Compile_Commands
In C++, Differencebetween a Method and a Function
How to Determine the Correct Size of a Qtablewidget
Is There a Reason Declval Returns Add_Rvalue_Reference Instead of Add_Lvalue_Reference
Why Does Not a Template Template Parameter Allow 'Typename' After the Parameter List
Noexcept, Stack Unwinding and Performance
Meaningful Stack Traces for Address Sanitizer in Gcc
Returning a "Null Reference" in C++
"Template<>" VS "Template" Without Brackets - What's the Difference
Stl Map Should Use Find() or [N] Identifier to Find Element in Map
Do Stl Maps Initialize Primitive Types on Insert
Qt "Private Slots:" What Is This
Const Unsigned Char * to Std::String
How to Test If a Constexpr Function Is Evaluated at Compile Time
Is It Always the Case That Sizeof(T) >= Alignof(T) for All Object Types T
Weird Undefined Symbols of Static Constants Inside a Struct/Class