Linux Static Linking Is Dead

Linux static linking is dead?

Concerning that fact is there any reasonable way now to create a full-functioning static build on Linux or static linking is completely dead on Linux?

I do not know where to find the historic references, but yes, static linking is dead on GNU systems. (I believe it died during the transition from libc4/libc5 to libc6/glibc 2.x.)

The feature was deemed useless in light of:

  • Security vulnerabilities. Application which was statically linked doesn't even support upgrade of libc. If app was linked on system containing a lib vulnerability then it is going to be perpetuated within the statically linked executable.

  • Code bloat. If many statically linked applications are ran on the same system, standard libraries wouldn't be reused, since every application contains inside its own copy of everything. (Try du -sh /usr/lib to understand the extent of the problem.)

Try digging LKML and glibc mail list archives from 10-15 years ago. I'm pretty sure long ago I have seen something related on LKML.

Why is statically linking glibc discouraged?

The reasons given in other answers are correct, but they are not the most important reason.

The most important reason why glibc should not be statically linked, is that it makes extensive internal use of dlopen, to load NSS (Name Service Switch) modules and iconv conversions. The modules themselves refer to C library functions. If the main program is dynamically linked with the C library, that's no problem. But if the main program is statically linked with the C library, dlopen has to go load a second copy of the C library to satisfy the modules' load requirements.

This means your "statically linked" program still needs a copy of libc.so.6 to be present on the file system, plus the NSS or iconv or whatever modules themselves, plus other dynamic libraries that the modules might need, like ld-linux.so.2, libresolv.so.2, etc. This is not what people usually want when they statically link programs.

It also means the statically linked program has two copies of the C library in its address space, and they might fight over whose stdout buffer is to be used, who gets to call sbrk with a nonzero argument, that sort of thing. There is a bunch of defensive logic inside glibc to try to make this work, but it's never been guaranteed to work.

You might think your program doesn't need to worry about this because it doesn't ever call getaddrinfo or iconv, but locale support uses iconv internally, which means any stdio.h function might trigger a call to dlopen, and you don't control this, the user's environment variable settings do.

And if your program does call iconv, for example, then things get even worse, especially when a “statically linked” executable is built on one distro, and then copied to another. The iconv modules are sometimes located in different places on different distros, so an executable that was built, say, on a Red Hat distro may fail to run properly on a Debian one, which is exactly the opposite of what people want from statically linked executables.

Static linking of Libc

The answer was pretty simple and has been given in a comment by the user Faust, merely just add -static as gcc option.

Static and dynamic linking w.r.t. portability, in the context of Go

On GNU/Linux, almost all Go executables fall into these categories:

  1. Those that include the application, the Go run-time, and a statically linked copy of (parts of) glibc.
  2. Those that include just the application and the Go run-time, statically linked, and none of glibc.
  3. Those that include just the application and the Go run-time, statically linked, and link to glibc dynamically.

Go-related tooling often conflates these linking modes, unfortunately. The main reason for the glibc dependency is that the application uses host name and user lookup (functions like getaddrinfo and getpwuid_r). CGO_ENABLED=0 switches from implementations like src/os/user/cgo_lookup_unix.go (uses glibc) to src/os/user/lookup_unix.go (does not use glibc). The non-glibc implementation does not use NSS and thus offers somewhat limited functionality (which generally do not affect users that do not store user information in LDAP/Active Directory).

In your case, setting CGO_ENABLED=0 moves your application from the third category to the second. (There is other Go-related tooling that could build an application of the first kind.) The non-NSS lookup code is not very large, so the increase in binary size is not significant. Since the Go run-time was already statically linked, it's even possible that the reduced overhead from static linking results in net reduction of executable size.

The most important issue to consider here is that NSS, threads and static linking do not all that well in glibc. All Go programs are multi-threaded, and the reason to (statically) link glibc into Go programs is precisely access to NSS functions. Therefore, statically linking Go programs against glibc is always the wrong thing to do. It is basically always buggy. Even if Go programs were not multi-threaded, a statically linked program which uses NSS functions needs the exact same version of glibc at run time that was used at build time, so static linking of such application reduces portability.

All these are reasons why Go applications of the first kind are such a bad idea. Producing a statically linked application using CGO_ENABLED=0 does not have these problems because those applications (of the second kind) do not include any glibc code (at the cost of reduced functionality of the user/host lookup functions).

If you want to create a portable binary which needs glibc, you need to link your application dynamically (the third kind), on a system with the oldest glibc you want to support. The application will then run on that glibc version and all later versions (for now, Go does not link libc correctly, so there is no strong compatibility guarantuee even for glibc). Distributions are generally ABI-compatible, but they have different versions of glibc. glibc goes to great lengths to make sure that applications dynamically linked against older versions of glibc will keep running on new versions of glibc, but the converse is not true: Once you link an application on a certain version of glibc, it may pick up features (symbols) that are just not available on older versions, so the application will not work with those older versions.

Is it sensible to build an application with static linking on linux?

The advantages are, as you expect, a single binary that works without having to install the other dependencies and which you can easily move around.

The disadvantages are the size and the need to recompile the entire application if there's an update (e.g. a security fix) to the linked library and perhaps licensing issues (as you've noted).

Tradeoffs. If it solves your problem, go for it.

Remove dead code when linking static library into dynamic library

You can use a version script to mark the entry points in combination with -ffunction-sections and --gc-sections.

For example, consider this C file (example.c):

int
foo (void)
{
return 17;
}

int
bar (void)
{
return 251;
}

And this version script, called version.script:

{
global: foo;
local: *;
};

Compile and link the sources like this:

gcc -Wl,--gc-sections -shared -ffunction-sections -Wl,--version-script=version.script example.c

If you look at the output of objdump -d --reloc a.out, you will notice that only foo is included in the shared object, but not bar.

When removing functions in this way, the linker will take indirect dependencies into account. For example, if you turn foo into this:

void *
foo (void)
{
extern int bar (void);
return bar;
}

the linker will put both foo and bar into the shared object because both are needed, even though only bar is exported.

(Obviously, this will not work on all platforms, but ELF supports this.)

Shared library symbol conflicts and static linking (on Linux)

Is this a designed behavior?

Yes.

At the time of introduction of shared libraries on UNIX, the goal was to pretend that they work just as if the code was in a regular (archive) library.

Suppose you have foo() defined in both libfoo and libbar, and bar() in libbar calls foo().

The design goal was that cc main.c -lfoo -lbar works the same regardless of whether libfoo and libbar are archive or a shared libraries. The only way to achieve this is to have libbar.so use dynamic linking to resolve call from bar() to foo(), despite having a local version of foo().

This design makes it impossible to create a self-contained libbar.so -- its behavior (which functions it ends up calling) depends on what other functions are linked into the process. This is also the opposite of how Windows DLLs work.

Creating self-contained DSOs was not a consideration at the time, since UNIX was effectively open-source.

You can change the rules with special linker flags, such as -Bsymbolic. But the rules get complicated very quickly, and (since that isn't the default) you may encounter bugs in the linker or the runtime loader.

Linking a Static library into a shared library

Do what the compiler suggests: Recompile with -fPIC

Explanation: Shared Objects have the requirement, that the addresses in their code do not depend on the memory layout of the binary image in address space. Statically linked code is not bound by this, all addresses in virtual address space are known at link time and hence the code is not required to cope with locations being not determined at compile time.

The -fPIC compiler flag enables compilation of Position Independent Code (PIC). The static libraries you're trying to link were not compiled as PIC that's why the linker complains. The solution is to recompile the static library with PIC enabled.


On a side note: PIC is also a fundamental for Address Space Layout Randomization (ASLR), a security measure with the goal of making the exploitation of vulnerable programs harder.



Related Topics



Leave a reply



Submit