Why Does The Same Executable Use Different Runpaths for Different Library Lookups

Why does the same executable use different RUNPATHs for different library lookups?

My question is: how are these variants on RUNPATH concocted?

Unlike older RPATH, the RUNPATH applies only when searching for direct dependencies of the binary.

That is, if a.out has RUNPATH of /foo, and NEEDED of libfoo.so (located in /foo), then libfoo.so will be found. But if libfoo.so itself depends on libbar.so (also located in /foo), and if libfoo.so does not have RUNPATH, then libbar.so will not be found.

This behavior promotes "every ELF binary should be self-sufficient". In the case above, libfoo.so is not self-sufficient (needs libbar.so but doesn't say where to find it).

If you use RPATH instead, the path there would apply to every search, and libbar.so will be found. You can achieve this with -Wl,--disable-new-dtags when linking a.out.

Link only needed symbols when compiling an executable with a Shared Library

Details of dynamic linking and the kinds of objects involved vary across environments and toolchains. On Linux, where you say you are, and on Solaris, and several other UNIX-y platforms, you are looking at ELF objects and semantics.

So far I tried using -Wl,--as-needed,
-Wl,--unresolved-symbols=ignore-in-shared-libs,

These both have their full effect at (static) link time. The first tells the linker that the libraries following it on the command line should be linked in only if they resolve at least one as-yet undefined symbol. The latter tells the linker to not worry about resolving symbols in shared libraries included in the link. That has nothing to do with the behavior of the dynamic linker when you run the program.

and opening the shared object with dlopen to get the function I want with dlsym.

dlopen instructs the dynamic linker to link in a shared object at runtime that was not specified in the binary as a required shared library. Its behavior at that point can be modulated by the flags passed to dlopen, but the options available are not more than can be specified at link time. There is little reason to use dlopen when you actually know at link time what libraries you need.

Are you forced to resolve every undefined symbol of a dynamic library
when linking it against an executable ?

Focusing on ELF and the GNU toolchain, no. -Wl,--unresolved-symbols=ignore-in-shared-libs serves precisely the purpose of avoiding that. But as you've discovered, that comes with caveats.

In the first place, in every shared object, every symbol referring to data needs to be resolved at runtime by the dynamic linker, no matter how you linked the various shared objects, including the main program. This is primarily an operational consideration -- the dynamic linker has no way to defer resolving symbols referring to objects because it has no good way to trap attempts to access them.

On the other hand, it is possible to defer resolution of symbols referring to functions until their first use. In fact, this is the GNU linker's default, but you can reaffirm this by passing -Wl,-z,lazy to gcc when linking. Note well, however, that this sets a property of the object being linked, so you should ensure that every shared object is built with that link option (but ordinarily they are because, again, that's the default).

Additionally, you should be aware that the dynamic linker's behavior can be influenced by environment variables. In particular, lazy binding will be disabled if the dynamic linker finds LD_BIND_NOW set to a nonempty string in the runtime environment.

A simple workaround would be to add all the dependencies of my
libraries when compiling the executable. But they're so full of
dependencies that this sometimes means adding 10+ libraries to the
command line, and this would be for something like a hundred
executable.

And what's the big deal with that, really? Surely you have a well-factored Makefile (or several) to help you, so it shouldn't be a big deal to ensure that all the libraries are linked. Right?

But you should also consider refactoring your libraries, especially if "interdependent" means there are loops in the dependency graph. Dynamic linking is different from static linking, as you've discovered, and the differences are sometimes more subtle than those you're presently struggling with. Although it is not a hard rule, I urge you to avoid creating situations where the shared objects used by one process contain among them multiple definitions of the same external symbol, especially if that symbol is actually used.

Update

The above discussion focuses on linking shared libraries to an executable, but there is another important consideration: how the libraries themselves are linked. Each ELF object, whether executable or shared library, carries its own list of needed shared libraries. The dynamic linker will recursively include all of these in the list of shared libraries to be loaded (immediately) at program startup, notwithstanding its behavior with respect to lazy binding of symbols referring to functions.

Therefore, if you want an executable not to require a given shared library X, then not only that executable itself but also every shared library it does rely upon must avoid expressing a dependency on X. If some of the shared libs require X when used in conjunction with other programs, then that puts the onus on you to link in all the needed libraries when building those programs (otherwise, you can arrange to link only direct dependencies). You can tell the GNU linker to build shared libraries this way by passing it the --allow-shlib-undefined flag.

Here is a complete proof of concept:

main.c

int mul(int, int);

int main(void) {
    return mul(2, 3);
}

mul.c

int add(int, int);

int mul(int x, int y) {
    return x * y;
}

int mul2(int x, int y) {
    return add(x, y) * add(x, -y);
}

Makefile

CC = gcc
LD = gcc
CFLAGS = -g -O2 -fPIC -DPIC
LDFLAGS = -Wl,--unresolved-symbols=ignore-in-shared-libs
SHLIB_LDFLAGS = -shared -Wl,--allow-shlib-undefined

all: main

main: main.o libmul.so
    $(LD) $(CFLAGS) $(LDFLAGS) -o $@ $^

libmul.so: mul.o
    $(LD) $(CFLAGS) $(SHLIB_LDFLAGS) -o $@ $^

clean:
    rm -f main main.o libmul.so mul.o

Demo

$ make
gcc -g -O2 -fPIC -DPIC   -c -o main.o main.c
gcc -g -O2 -fPIC -DPIC   -c -o mul.o mul.c
gcc -g -O2 -fPIC -DPIC -shared -Wl,--allow-shlib-undefined -o libmul.so mul.o
gcc -g -O2 -fPIC -DPIC -Wl,--unresolved-symbols=ignore-in-shared-libs -o main main.o libmul.so
$ LD_LIBRARY_PATH=$(pwd) ./main
$ echo $?
6
$

Note that the -zlazy linker option discussed in comments is omitted, as it's the default.

What's the difference between -rpath and -L?

You must be reading some outdated copies of the manpages (emphasis added):

-rpath=dir

      Add a directory to the runtime library search path. This is used

      when linking an ELF executable with shared objects. All -rpath

      arguments are concatenated and passed to the runtime linker, which

      uses them to locate shared objects at runtime.

vs.

-L searchdir

--library-path=searchdir

Add path searchdir to the list of paths that ld will search for

archive libraries and ld control scripts.

So, -L tells ld where to look for libraries to link against when linking. You use this (for example) when you're building against libraries in your build tree, which will be put in the normal system library paths by make install. --rpath, on the other hand, stores that path inside the executable, so that the runtime dynamic linker can find the libraries. You use this when your libraries are outside the system library search path.

Difference between relative path and using $ORIGIN as RPATH

If your use $ORIGIN, the lookup is relative to the directory that contains the executable. If you specifiy a relative directory, it's relative to the current working directory, which is hardly ever what you want.

How can I set an executable's rpath and check its value after building it?

If I didnt miss something here, you are not linking any libs in your build command.

Lets say you want to link libusb.so shared library, which is located in libusb sub-folder of your current folder where is main.cpp.
I will not take any details here, about soname, linkname of lib etc, just to make clear about rpath.

rpath will provide runtime linker path to library, not for linktime, cause even shared library need to be present(accessible) in compile/link time. So, to provide your application loader with possibility to look for needed library in start time, relatively to your app folder, there is $ORIGIN variable, you can see it with readelf but only if you link some library with $ORIGIN in rpath.
Here is example based on your question:

g++ main.cpp -o main -L./libusb -Wl,-rpath,'$ORIGIN/libusb' -lusb

As you see, you need to provide -L directory for compile/link time search, and rpath for runtime linker. Now you will be able to examin all needed libs for your app using readelf and location for search.

Why Does The Same Executable Use Different Runpaths for Different Library Lookups