Is There a Reason Why Not to Use Link-Time Optimization (Lto)

Is there a reason why not to use link-time optimization (LTO)?

I assume that by "production software" you mean software that you ship to the customers / goes into production. The answers at Why not always use compiler optimization? (kindly pointed out by Mankarse) mostly apply to situations in which you want to debug your code (so the software is still in the development phase -- not in production).

6 years have passed since I wrote this answer, and an update is necessary. Back in 2014, the issues were:

  • Link time optimization occasionally introduced subtle bugs, see for example Link-time optimization for the kernel. I assume this is less of an issue as of 2020. Safeguard against these kinds of compiler and linker bugs: Have appropriate tests to check the correctness of your software that you are about to ship.
  • Increased compile time. There are claims that the situation has significantly improved since 2014, for example thanks to slim objects.
  • Large memory usage. This post claims that the situation has drastically improved in recent years, thanks to partitioning.

As of 2020, I would try to use LTO by default on any of my projects.

Link-time optimization (lto) for mixed C++/C programs

This should not be a problem at all. In both GCC and Clang link-time optimization operates on intermediate representation of the code. That is, by using -flto you create object files with additional LTO information (gcc) or LLVM bytecode (Clang), at which point the source language stops to matter.

Some go as far as to mix even less related C++ and D yet still use LTO: http://johanengelen.github.io/ldc/2016/11/10/Link-Time-Optimization-LDC.html

link-time optimization versus. project inlining; limitations on each approach

The technical name, approaching minor buzzword status, for that approach is unity build.

See for example:

The benefits / disadvantages of unity builds?

The downside is best described here:

http://leewinder.co.uk/blog/?p=394

The short version is it is more or less a choice of languages: you either write in regular-C++ or Unified-build-C++. The 'correct' way of writing virtually any code will differ between the two.

Using GCC's link-time optimization with static linked libraries

Here is an MCVE CMake project that reproduces the problem:

$ ls -R hellow
hellow:
CMakeLists.txt hello.c libhello.c

$ cat hellow/CMakeLists.txt
cmake_minimum_required (VERSION 2.6)
project (hellow)
SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -flto")
SET(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -flto")
#SET(CMAKE_AR "gcc-ar")
#SET(CMAKE_C_ARCHIVE_CREATE "<CMAKE_AR> qcs <TARGET> <LINK_FLAGS> <OBJECTS>")
#SET(CMAKE_C_ARCHIVE_FINISH true)
add_library(hello STATIC libhello.c)
add_executable(hellow hello.c)
target_link_libraries(hellow hello)
add_dependencies(hellow hello)

$ cat hellow/hello.c
extern void hello(void);

int main(void)
{
hello();
return 0;
}

$ cat hellow/libhello.c
#include <stdio.h>

void hello(void)
{
puts("Hello");
}

Configuration is good:

$ mkdir build_hellow
$ cd build_hellow/
$ cmake ../hellow
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/imk/dev/so/build_hellow

Build fails as per problem:

$ make
Scanning dependencies of target hello
[ 25%] Building C object CMakeFiles/hello.dir/libhello.c.o
[ 50%] Linking C static library libhello.a
/usr/bin/ar: CMakeFiles/hello.dir/libhello.c.o: plugin needed to handle lto object
/usr/bin/ranlib: libhello.c.o: plugin needed to handle lto object
[ 50%] Built target hello
Scanning dependencies of target hellow
[ 75%] Building C object CMakeFiles/hellow.dir/hello.c.o
[100%] Linking C executable hellow
/tmp/ccV0lG36.ltrans0.ltrans.o: In function `main':
<artificial>:(.text+0x5): undefined reference to `hello'
collect2: error: ld returned 1 exit status
CMakeFiles/hellow.dir/build.make:95: recipe for target 'hellow' failed
make[2]: *** [hellow] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/hellow.dir/all' failed
make[1]: *** [CMakeFiles/hellow.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

There is more than one solution. One is to uncomment the 3 commented lines
in CMakeLists.txt above. Then:

$ cmake ../hellow/
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/imk/dev/so/build_hellow

$ make
Scanning dependencies of target hello
[ 25%] Building C object CMakeFiles/hello.dir/libhello.c.o
[ 50%] Linking C static library libhello.a
[ 50%] Built target hello
Scanning dependencies of target hellow
[ 75%] Building C object CMakeFiles/hellow.dir/hello.c.o
[100%] Linking C executable hellow
[100%] Built target hellow

$ ./hellow
Hello

This fix makes use of the following facts.

The build-breaking problem:

/usr/bin/ar: CMakeFiles/hello.dir/libhello.c.o: plugin needed to handle lto object
...
/usr/bin/ranlib: libhello.c.o: plugin needed to handle lto object

can solved by giving ar and ranlib the option:

--plugin=$(gcc --print-file-name=liblto_plugin.so)

However, GNU ranlib is merely a synonym for ar -s, and gcc-ar is a
wrapper for ar that supplies that plugin.

CMake's build template for a C static library is:

CMAKE_C_ARCHIVE_CREATE ( = <CMAKE_AR> qc <TARGET> <LINK_FLAGS> <OBJECTS>)
CMAKE_C_ARCHIVE_FINISH ( = <CMAKE_RANLIB> <TARGET>)

which for GNU ar is equivalent to:

CMAKE_C_ARCHIVE_CREATE ( = <CMAKE_AR> qcs <TARGET> <LINK_FLAGS> <OBJECTS>)
CMAKE_C_ARCHIVE_FINISH ( = true) # Or any other no-op command

So with these settings plus:

SET(CMAKE_AR  "gcc-ar")

we're good.

For a C++ project, of course, set CMAKE_CXX_ARCHIVE_CREATE and CMAKE_CXX_ARCHIVE_FINISH

Does LTO works when compiling with GCC but linking with LLVM LLD?

I did some research and finally concluded for myself that no LTO is done if we use lld when compiling with gcc. What I did:

Based on this somewhat vague presentation: https://www.slideshare.net/chimerawang/gcc-lto, I found that the linker is not directly doing the optimization, but rather, after reading all the symbols from all the object files, he passes the info to the lto-wrapper who then does the optimization through some other processes. So I made a test using a hello-world cpp file, compiling it with the -v flag and indeed I saw the succession of calls as earlier mentioned (collect2 (linker) -> lto-wrapper -> lto1). But this when using the default linker or the gold linker. When I used the -fuse-ld=lld flag, only the collect2 process was called. And this first thing made me believe that LTO was not done at all.

But hey, maybe the lld linker internalized the LTO process so it is done without calling any other process. So I made another test to see if LTO is done (based on this article). Basically from one cpp file I call for 100 000 000 times a function that's defined in other cpp file, a function which does nothing. Using basic -O2 optimization, the resulted binary runs in ~200ms, as the compiler is not able to optimize out the useless function calls. When using also the -flto flag and either ld or gold linker, the resulted binary runs in ~2 ms. But when using the lld linker, the resulted binary also runs in ~200ms. So lld with lto runs as slow as lld without lto. No sign of optimization whatsoever.

To be mentioned here that using the lld linker, the link command would fail if the objects would not be compiled using -ffat-lto-objects. This flag makes the object files larger because the compiler dumps not only the lto code, but also the code that can be linked without lto.

So, considering the time performance of the binary linked with lld and also the fact that objects need to be compiled with -ffat-lto-objects, I concluded that when the lld linker is used, LTO is not achieved at all, but lld uses the non-LTO code generated by the compiler in order to link the binary.

Inline functions and link time optimizations

From C11 6.7.4p7 emphasis mine:

Any function with internal linkage can be an inline function. For a
function with external linkage, the following restrictions apply: If a
function is declared with an inline function specifier, then it shall
also be defined in the same translation unit. If all of the file scope
declarations for a function in a translation unit include the inline
function specifier without extern, then the definition in that
translation unit is an inline definition. An inline definition does
not provide an external definition for the function
, and does not
forbid an external definition in another translation unit. An inline
definition provides an alternative to an external definition, which a
translator may use to implement any call to the function in the same
translation unit. It is unspecified whether a call to the function
uses the inline definition or the external definition

Your code is does not provide a definition of foo with external linkage and compiler is right - there is no foo. The foo in test1.c is an inline function, it doesn't provide a function with external definition.

Did I compile it correctly?

Well, yes.

Is gcc failing to inline?

The compilation failed, so yes.

The keyword inline may serve as a hint for the compiler to maybe inline the function. There is no requirement that compiler will do that. inline is a misleading keyword - it serves primarily to modify linkage of objects, so to let the compiler choose between inline and non-inline versions of the same function available within the same transaction unit. It does not mean that the function will be inlined.

If you are using LTO, just drop the inline, there is no point in hinting the compiler - trust gcc it will do a better job optimizing then you will, with LTO it "sees" all the functions in single transaction unit anyway. Also read gcc docs on inline and remember about the rules of optimization.



Related Topics



Leave a reply



Submit