Library Path Order for Alternate Glibc Dynamic Linker (Ld.So)

Library path order for alternate glibc dynamic linker (ld.so)

I got it, the issue was with the OS ABI version. That's the number indicated by file, such as:

$ file /lib/x86_64-linux-gnu/libc-2.15.so | grep -o "for GNU/Linux [0-9.]*"
for GNU/Linux 2.6.24

When glibc is configured with nothing other than --prefix, it builds by default with an ABI version smaller(!!) (in my case, 2.6.16) than the default on the system (2.6.24). So libc-2.18 has ABI version smaller than libc-2.15.

When ldconfig finds 2 versions of libc.so.6 with different ABI numbers, it places them in ld.so.cache in order of descending ABI number, not in order of appearance. This can be checked by swapping their locations, rebuilding the cache (with ldconfig), and listing cache contents (with ldconfig -p). Only when 2 libc.so.6 files have the same ABI version, do they get placed in the cache in order of appearance.

Configuring glibc with --enable-kernel=2.6.24 causes it to use the same ABI version as the system, which in turn fixes the resolution issues in the question statement, without the need for an explicit --rpath or LD_LIBRARY_PATH.

Multiple glibc libraries on a single host

It is very possible to have multiple versions of glibc on the same system (we do that every day).

However, you need to know that glibc consists of many pieces (200+ shared libraries) which all must match. One of the pieces is ld-linux.so.2, and it must match libc.so.6, or you'll see the errors you are seeing.

The absolute path to ld-linux.so.2 is hard-coded into the executable at link time, and can not be easily changed after the link is done (Update: can be done with patchelf; see this answer below).

To build an executable that will work with the new glibc, do this:

g++ main.o -o myapp ... \
   -Wl,--rpath=/path/to/newglibc \
   -Wl,--dynamic-linker=/path/to/newglibc/ld-linux.so.2

The -rpath linker option will make the runtime loader search for libraries in /path/to/newglibc (so you wouldn't have to set LD_LIBRARY_PATH before running it), and the -dynamic-linker option will "bake" path to correct ld-linux.so.2 into the application.

If you can't relink the myapp application (e.g. because it is a third-party binary), not all is lost, but it gets trickier. One solution is to set a proper chroot environment for it. Another possibility is to use rtldi and a binary editor. Update: or you can use patchelf.

Modify glibc dynamic linker to check if a shared library has been loaded in another process

Building dynamic linker, however, gives me "multiple definitions of x symbols" error

This is because the dynamic linker is very special, and you are very restricted in what you can do in the dynamic linker.

It is special because it must be a stand-alone program -- it can't use any other library (including libc.so.6) -- it is responsible for loading all other libraries, so naturally it can't use anything that it has yet to load.

I just want to compute them once when the library is being physically loaded the first time.

This is still an XY Problem. What are you going to do with the result of this computation?

One possible answer is: store them in a file or a database.

If that is your answer, then the solution becomes obvious: check if the file or a database entry exists. If it does, you don't need to do the computation again.

Update:

The main problem for both lsof or file/databased based solution is: when I add a new .c file and include <stdio.h> in that file to do file operations (such as FILE* fp = fopen()), the glibc build gives me errors

This is the exact same problem: you are trying to use parts of libc.so which can't be used in a dynamic linker.

If you want to store the result of your computation in a file, you need to use low-level parts which are usable. Use open() and write() instead of fopen() and fprintf().

Alternatively, do it from within your library or program -- since you will no longer care about how many processes have loaded the library, there is no reason to try to perform this computation in the loader. (There might be a reason, but you are not explaining it; so we are back to XY problem.)

How to use alternate glibc with existing libstdc++?

Do I absolutely need the new 2.18 dynamic linker to run a program with glibc-2.18?

Yes (well, almost. See footnote).

This would avoid me having to set up and continuously update the paths of the 2.18 dynamic linker.

A common technique is to create a g++ shell wrapper, e.g. g++glibc2.18, and encapsulate adding the necessary link arguments there. Then a simple make CXX=g++glibc2.18 would do the right thing.

Can't the standard linker do it?

No. See this answer for explanation.

If I compile with the 2.18 dynamic linker but without --rpath, the program doesn't work. Why?

See the same answer.

Should I be using -L/path/to/glibc-2.18/lib in the compilation command (in addition to --rpath and --dynamic-linker)?

Yes, if you want to use symbols that are present in glibc-2.18 but not present in your system library. Otherwise, no.

Footnote:

As an alternative, you could build your program without the special flags, then use "explicit loader invocation" to actually run it: /path/to/glibc-2.18/lib/ld-2.18.so /path/to/a.out.

Beware: this doesn't always work: if the program likes to re-exec itself (and under other rare conditions). You may also have trouble debugging it when it is invoked that way.

Wrapping a glibc function using the dynamic linker

I [place] my library where the executable's RPATH is pointing with the name libc.so.6.

And therefore the process loads your library instead of GLIBC's libc.so.6. ~~That is surely not what you want unless you're providing an independent implementation of at least the entire C standard library.~~ That requires your library to provide an independent implementation of everything in libc.so.6, or else to dynamically load the real libc. I see that you attempt to attain completeness by statically linking libgcc (I guess you mean using -lstatic-libgcc), but

that's the wrong library. It provides functions to support GCC-compiled binaries, which may include wrappers or alternatives for some C-library functions, but it does not provide the C library itself.
even if you linked the C library statically (e.g. -lc_nonshared), the library you get that way will not include a dynamically loadable symbol by which you can access the wrapped function. That's a direct consequence of the static linking.
your library is anyway not independent because it tries to wrap GLIBC's implementation of fstat(), which is unavailable to it.

Why is there a relocation error on __libc_start_main and not on system?

There is a relocation error on __libc_start_main because that function is provided by the glibc's libc.so.6, but not by yours, nor by your binary itself or any other library dynamically linked to your binary. (See (1) above)

If there is no relocation error for the system or printf functions, then it follows that either those are not external dynamic symbols in your binary, or they are satisfied by some other library dynamically linked to your binary. (Update: the linker debug information shows the former to be the case: those are not external symbols, by which I meant symbols for which no definition is provided; they are provided by your library, presumably as a result of linking in libgcc.) ~~The details don't really matter, as the point is that your strategy is totally unworkable.~~ Consider using LD_PRELOAD instead.

Update: an alternative might be to give your shared library a constructor function that dlopen()s the real libc.so.6 with flag RTLD_GLOBAL enabled. This way, your library does not have to provide a complete C library itself, but it does need to know how to find the real library. That will certainly solve the problem of the wrapped function being unavailable in the wrapper.

How does dynamic linker know which library to search for a symbol?

dlopen can't (nor can anything else) change the definition of (global) symbols already present at the time of the call. It can only make available new ones that did not exist before.

The (sloppy) formalization of this is in the specification for dlopen:

Symbols introduced into the process image through calls to dlopen() may be used in relocation activities. Symbols so introduced may duplicate symbols already defined by the program or previous dlopen() operations. To resolve the ambiguities such a situation might present, the resolution of a symbol reference to symbol definition is based on a symbol resolution order. Two such resolution orders are defined: load order and dependency order. Load order establishes an ordering among symbol definitions, such that the first definition loaded (including definitions from the process image file and any dependent executable object files loaded with it) has priority over executable object files added later (by dlopen()). Load ordering is used in relocation processing. Dependency ordering uses a breadth-first order starting with a given executable object file, then all of its dependencies, then any dependents of those, iterating until all dependencies are satisfied. With the exception of the global symbol table handle obtained via a dlopen() operation with a null pointer as the file argument, dependency ordering is used by the dlsym() function. Load ordering is used in dlsym() operations upon the global symbol table handle.

Note that LD_PRELOAD is nonstandard functionality and thus not described here, but on implementations that offer it, LD_PRELOAD acts with load order after the main program but before any shared libraries loaded as dependencies.

What is the difference between LD_PRELOAD_PATH and LD_LIBRARY_PATH?

LD_PRELOAD (not LD_PRELOAD_PATH) is a list of specific libraries (files) to be loaded before any other libraries, whether the program wants it or not. LD_LIBRARY_PATH is a list of directories to search when loading libraries that would have been loaded anyway. On linux you can read man ld.so for more information about these and other environment variables that affect the dynamic linker.

Custom ld-linux.so for subprocesses

I think the problem is that the subprocess is reverting to the system /lib64/ld-linux-x86-64.so.2.

That is what one should expect to happen if the execve argument is /path/to/subprocess or subprocess.

If you want the subprocess to use explicit loader invocation /path/to/my/ld-linux-x86-64.so.2 --library-path /path/to/my/libs /path/to/subprocess, then you must arrange for execve arguments to be exactly that.

This is why using patchelf or other solutions from this answer is generally a better approach.

How to build a C program using a custom version of glibc and static linking?

Following a couple of suggestions from the glibc help mailing list (libc-help@sourceware.org), I have a solution. It turns out that this task is a bit tricky because you have to tell the linker to omit everything it would normally include automatically (and silently), and then include back everything that it needs, including a bunch of start and end files. Some of the start and end files come from libc and some come from gcc, so the make rule is a bit complicated. Below is a general sample makefile to illustrate the approach. I will assume that you are building a program called prog from a source file called prog.c and that you have installed your custom glibc in directory /home/my_acct/glibc_install.

TARGET = prog
OBJ = $(TARGET).o
SRC = $(TARGET).c
CC = gcc
CFLAGS = -g
LDFLAGS = -nostdlib -nostartfiles -static
GLIBCDIR = /home/my_acct/glibc_install/lib
STARTFILES = $(GLIBCDIR)/crt1.o $(GLIBCDIR)/crti.o `gcc --print-file-name=crtbegin.o`
ENDFILES = `gcc --print-file-name=crtend.o` $(GLIBCDIR)/crtn.o
LIBGROUP = -Wl,--start-group $(GLIBCDIR)/libc.a -lgcc -lgcc_eh -Wl,--end-group

$(TARGET): $(OBJ)
        $(CC) $(LDFLAGS) -o $@ $(STARTFILES) $^ $(LIBGROUP) $(ENDFILES) 

$(OBJ): $(SRC)
        $(CC) $(CFLAGS) -c $^

clean:
        rm -f *.o *.~ $(TARGET)

How can I verify what dynamic linker is used when a program is run?

There is no need to actually run the executable to determine the ELF interpreter that it will use.

We can use static tools and be guaranteed that we can get the full path.

We can use a combination of readelf and ldd.

If we use readelf -a, we can parse the output.

One part of the readelf output:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         00000000000002e0  000002e0
       000000000000001c  0000000000000000   A       0     0     1

Note the address of the .interp section. It is 0x2e0.

If we open the executable and do a seek to that offset, we can read the ELF interpreter string. For example, here is [what I'll call] fileBad:

000002e0: 2F6C6962 36342F7A 642D6C69 6E75782D  /lib64/zd-linux-
000002f0: 7838362D 36342E73 6F2E3200 00000000  x86-64.so.2.....

Note that the string seems a little odd ... More on that later ...

Under the "Program Headers:" section, we have:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000002a0 0x00000000000002a0  R      0x8
  INTERP         0x00000000000002e0 0x00000000000002e0 0x00000000000002e0
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/zd-linux-x86-64.so.2]

Again, note the 0x2e0 file offset. This may be an easier way to get the path to the ELF interpreter.

Now we have the full path to the ELF interpreter.

We can now do ldd /path/to/executable and we'll get a list of the shared libraries it is/will be using. We'll do this for fileGood. Normally, this looks like [redacted]:

linux-vdso.so.1 (0x00007ffc96d43000)
libpython3.7m.so.1.0 => /lib64/libpython3.7m.so.1.0 (0x00007f36d1ee2000)
...
libc.so.6 => /lib64/libc.so.6 (0x00007f36d1ac7000)
/lib64/ld-linux-x86-64.so.2 (0x00007f36d23ff000)
...

That's for a normal executable. Here's the ldd output for fileBad:

linux-vdso.so.1 (0x00007ffc96d43000)
libpython3.7m.so.1.0 => /lib64/libpython3.7m.so.1.0 (0x00007f36d1ee2000)
...
libc.so.6 => /lib64/libc.so.6 (0x00007f36d1ac7000)
/lib64/zd-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007f3f4f821000)
...

Okay, to explain ...

fileGood is a standard executable [/bin/vi on my system]. However, fileBad is a copy that I made where I patched the interpreter path to a non-existent file.

From the readelf data, we know the interpreter path. We can check for existence of that file. If it doesn't exist things are [obviously] bad.

With the interpreter path we got from readelf, we can find the output line from ldd for the interpreter.

For the good file, ldd gave us the simple interpreter resolution:

/lib64/ld-linux-x86-64.so.2 (0x00007f36d23ff000)

For the bad file, ldd gave us:

/lib64/zd-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007f3f4f821000)

So, either ldd or the kernel detected the missing interpreter and substituted the default one.

If we try to exec fileBad from the shell we get:

fileBad: Command not found

If we try to exec fileBad from a C program we get an ENOENT error:

No such file or directory

From this we know that the kernel did not try to use a "default" interpreter when we did an exec* syscall.

So, we now know that the static analysis we did to determine the ELF interpreter path is valid.

We can be assured that the path we came up with is [will be] the path to the ELF interpreter that the kernel will map into the process address space.

For further assurance, if you need to, download the kernel source code. Look in the file: fs/binfmt_elf.c

I think that's sufficient, but to answer the question in your top comment

with that solution would I not have to race to read /proc/<pid>/maps before the program terminates?

There's no need to race.

We can control the fork process. We can set up the child to run under [the syscall] ptrace, so we can control its execution (Note that ptrace is what gdb and strace use).

After we fork, but before we exec, the child can request that the target of the exec sleep until a process attaches to it via ptrace.

So, the parent can examine /proc/pid/maps [or whatever else] before the target executable has executed a single instruction. It can control execution via ptrace [and, eventually, detach to allow the target to run normally].

Is there a way to predict what PID will be generated next and then wait on its creation in /proc?

Given the answer to the first part of your question, this is a bit of a moot point.

There is no way to [accurately] predict the pid of a process we fork. If we could determine the pid that the system would use next, there is no guarantee that we will win the race against another process doing a fork [before us] and "getting" the pid we "thought" would be ours.

Library Path Order for Alternate Glibc Dynamic Linker (Ld.So)