Why Is Statically Linking Glibc Discouraged

Why is statically linking glibc discouraged?

The reasons given in other answers are correct, but they are not the most important reason.

The most important reason why glibc should not be statically linked, is that it makes extensive internal use of dlopen, to load NSS (Name Service Switch) modules and iconv conversions. The modules themselves refer to C library functions. If the main program is dynamically linked with the C library, that's no problem. But if the main program is statically linked with the C library, dlopen has to go load a second copy of the C library to satisfy the modules' load requirements.

This means your "statically linked" program still needs a copy of libc.so.6 to be present on the file system, plus the NSS or iconv or whatever modules themselves, plus other dynamic libraries that the modules might need, like ld-linux.so.2, libresolv.so.2, etc. This is not what people usually want when they statically link programs.

It also means the statically linked program has two copies of the C library in its address space, and they might fight over whose stdout buffer is to be used, who gets to call sbrk with a nonzero argument, that sort of thing. There is a bunch of defensive logic inside glibc to try to make this work, but it's never been guaranteed to work.

You might think your program doesn't need to worry about this because it doesn't ever call getaddrinfo or iconv, but locale support uses iconv internally, which means any stdio.h function might trigger a call to dlopen, and you don't control this, the user's environment variable settings do.

And if your program does call iconv, for example, then things get even worse, especially when a “statically linked” executable is built on one distro, and then copied to another. The iconv modules are sometimes located in different places on different distros, so an executable that was built, say, on a Red Hat distro may fail to run properly on a Debian one, which is exactly the opposite of what people want from statically linked executables.

Why cant you statically link dynamic libraries?

Why is this the case?

Most linkers (AIX linker is a notable exception) discard information in the process of linking.

For example, suppose you have foo.o with foo in it, and bar.o with bar in it. Suppose foo calls bar.

After you link foo.o and bar.o together into a shared library, the linker merges code and data sections, and resolves references. The call from foo to bar becomes CALL $relative_offset. After this operation, you can no longer tell where the boundary between code that came from foo.o and code that came from bar.o was, nor the name that CALL $relative_offset used in foo.o -- the relocation entry has been discarded.

Suppose now you want to link foobar.so with your main.o statically, and suppose main.o already defines its own bar.

If you had libfoobar.a, that would be trivial: the linker would pull foo.o from the archive, would not use bar.o from the archive, and resolve the call from foo.o to bar from main.o.

But it should be clear that none of above is possible with foobar.so -- the call has already been resolved to the other bar, and you can't discard code that came from bar.o because you don't know where that code is.

On AIX it's possible (or at least it used to be possible 10 years ago) to "unlink" a shared library and turn it back into an archive, which could then be linked statically into a different shared library or a main executable.

If foo.o and bar.o are linked into a foobar.so, wouldn't it make sense that the call from foo to bar is always resolved to the one in bar.o?

This is one place where UNIX shared libraries work very differently from Windows DLLs. On UNIX (under common conditions), the call from foo to bar will resolve to the bar in main executable.

This allows one to e.g. implement malloc and free in the main a.out, and have all calls to malloc use that one heap implementation consistently. On Windows you would have to always keep track of "which heap implementation did this memory come from".

The UNIX model is not without disadvantages though, as the shared library is not a self-contained mostly hermetic unit (unlike a Windows DLL).

Why would you want to resolve it to another bar from main.o?

If you don't resolve the call to main.o, you end up with a totally different program, compared to linking against libfoobar.a.

Why are stat & fstat linked statically?

I would love to understand: Why this is the case?

This answer explains why that is the case.

You'll need to wrap __xstat instead.

Finding haskell executable if statically linked via glibc or musl

One way of finding it, although it's limited to haskell based executable is using the --info option:

Example:

$ ./tldr +RTS --info -RTS
 [("GHC RTS", "YES")
 ,("GHC version", "8.6.5")
 ,("RTS way", "rts_thr")
 ,("Build platform", "x86_64-alpine-linux")
 ,("Build architecture", "x86_64")
 ,("Build OS", "linux")
 ,("Build vendor", "alpine")
 ,("Host platform", "x86_64-alpine-linux")
 ,("Host architecture", "x86_64")
 ,("Host OS", "linux")
 ,("Host vendor", "alpine")
 ,("Target platform", "x86_64-alpine-linux")
 ,("Target architecture", "x86_64")
 ,("Target OS", "linux")
 ,("Target vendor", "alpine")
 ,("Word size", "64")
 ,("Compiler unregisterised", "NO")
 ,("Tables next to code", "YES")
 ]

From the x86_64-apline-linux, I can confirm that the build was based on Alpine Linux which is based on musl. You can explicitly confirm via ldd that it is indeed statically linked then:

$ ldd ./tldr
        not a dynamic executable

Why Is Statically Linking Glibc Discouraged