Linux Shared Library Depends on Symbols in Another Shared Library Opened by Dlopen with Rtld_Local

Is it possible to link a shared library (from another shared library), without making its symbols globally visible?

In case you are ok with changing names of symbols from libC.so.2 you can use Implib.so's renaming functionality. E.g. to change all libC.so.2 symbols to have MYPREFIX_ prefix:

$ cat mycallback.c
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>

#ifdef __cplusplus
extern "C"
#endif
void *mycallback() {
  void *h = dlmopen(LM_ID_NEWLM, "libxyz.so", RTLD_LAZY | RTLD_DEEPBIND);
  if (h)
    return h;
  fprintf(stderr, "dlmopen failed: %s\n", dlerror());
  exit(1);
}
$ implib-gen.py --dlopen-callback=mycallback --symbol_prefix=MYPREFIX_ libC.so.2
$ ... # Link your app with libC.so.2.tramp.S, libC.so.2.init.c and mycallback.c, keep libC.so.1 unchanged

Function names in libC.so.2's header will need to be updated as well (often that's a simple s/// in vim).

Implib.so works by generating a bunch of wrappers for each symbol in problematic library (in this case libC.so.2) and forwarding calls to their actual implementation internally (via dlsym).

Cannot dlopen a shared library from a shared library, only from executables

The most likely reason for dlopen from the main executable to succeed and for the exact same dlopen from libcore.so to fail is that the main executable has correct RUNPATH to find all the libraries, but libcore.so does not.

You can verify this with:

readelf -d main-exe | grep R.*PATH
readelf -d libcore.so | grep R.PATH

If (as I suspect) main-exe has RUNPATH, and libcore.so doesn't, the right fix is to add -rpath=.... to the link line for libcore.so.

You can also gain a lot of insight into dynamic loader operation by using LD_DEBUG envrironment variable:

LD_DEBUG=libs ./main-exe

will tell you which directories the loader is searching for which libraries, and why.

I cannot find out why it's not working

Yes, you can. You haven't spent nearly enough effort trying.

Your very first step should be to print the value of dlerror() when dlopen fails. The next step is to use LD_DEBUG. And if all that fails, you can actually debug the runtime loader itself -- it's open-source.

dlopen and implicit library loading: two copies of the same library

I suspect that OPENGL is loading PLUGIN with the RTLD_LOCAL flag. This
is normally what you want when loading a plugin, so that multiple
plugins don't conflict.

We've had similar problems with loading code under Java: we'd load a
dozen or so different modules, and they couldn't communicate with one
another. It's possible that our solution would work for you: we wrote a
wrapper for the plugin, and told Java that the wrapper was the plugin.
That plugin then loaded each of the other shared objects, using dlopen
with RTLD_GLOBAL. This worked between plugins. I'm not sure that it
will allow the plugins to get back to the main, however (but I think it
should). And IIRC, you'll need special options when linking main for
its symbols to be available. I think Linux treats the symbols in main
as if main had been loaded with RTLD_LOCAL otherwise. (Maybe
--export-dynamic? It's been a while since I've had to do this, and I
can't remember exactly.)

Why would dlopen reuse the address of a previously loaded symbol?

The key to answering this question is whether the main executable exports the same symbol in its dynamic symbol table. That is, what is the output from:

nm -D a.out | grep ' mangled_name_of_the_symbol'

If the output is empty, the two libraries should indeed use separate (their own) copies of the symbol. But if the output is not empty, then both libraries should reuse the symbol defined in the main binary (this happens because UNIX dynamic linking attempts to emulate what would have happened if everything was statically linked into the main binary -- UNIX support for shared libraries happened long after UNIX itself became popular, and in that context this design decision made sense).

Demonstration:

// main.c
#include <assert.h>
#include <dlfcn.h>
#include <stdio.h>

int foo = 12;

int main()
{
  printf("main: &foo = %p, foo = %d\n", &foo, foo);
  void *h = dlopen("./foo.so", RTLD_NOW);
  assert (h != NULL);
  void (*fn)(void) = (void (*)()) dlsym(h, "fn");
  fn();

  return 0;
}

// foo.c
#include <assert.h>
#include <dlfcn.h>
#include <stdio.h>

int foo = 42;

void fn()
{
  printf("foo:  &foo = %p, foo = %d\n", &foo, foo);
  void *h = dlopen("./bar.so", RTLD_NOW);
  assert (h != NULL);

  void (*fn)(void) = (void (*)()) dlsym(h, "fn");
  fn();
}

// bar.c
#include <stdio.h>

int foo = 24;

void fn()
{
  printf("bar:  &foo = %p, foo = %d\n", &foo, foo);
}

Build this with:

gcc -fPIC -shared -o foo.so foo.c && gcc -fPIC -shared -o bar.so bar.c &&
gcc main.c -ldl && ./a.out

Output:

main: &foo = 0x5618f1d61048, foo = 12
foo:  &foo = 0x7faad6955040, foo = 42
bar:  &foo = 0x7faad6950028, foo = 24

Now rebuild just the main binary with -rdynamic (which causes foo to be exported from it): gcc main.c -ldl -rdynamic. The output changes to:

main: &foo = 0x55ced88f1048, foo = 12
foo:  &foo = 0x55ced88f1048, foo = 12
bar:  &foo = 0x55ced88f1048, foo = 12

P.S.
You can gain much insight into the behavior of dynamic linker by running with:

LD_DEBUG=symbols,bindings ./a.out

Update:

It turns out I asked a wrong question ... Added source example.

If you look at LD_DEBUG output, you'll see:

    165089: symbol=object;  lookup in file=./main [0]
    165089: symbol=object;  lookup in file=./liba.so [0]
    165089: binding file ./liba.so [0] to ./liba.so [0]: normal symbol `object'
    165089: symbol=object;  lookup in file=./main [0]
    165089: symbol=object;  lookup in file=./liba.so [0]
    165089: binding file ./libb.so [0] to ./liba.so [0]: normal symbol `object'

What this means: liba.so is in the global search list (by virtue of having been directly linked to by main). This is approximately equivalent to having done dlopen("./liba.so", RTLD_GLOBAL).

It should not be a surprise then that the symbols in it are available for subsequently loaded shared libraries to bind to, which is exactly what the dynamic loader does.

Force mapping between symbols and shared libraries

Is there a way to force foo.so to use fooHelper.so's implementation and bar.so to use barHelper.so's?

Yes: that's what RTLD_LOCAL is for (when dlopening foo.so and bar.so).

RTLD_LOCAL
  This is the converse of RTLD_GLOBAL, and the default if neither flag
  is specified. Symbols defined in this library are not made available
  to resolve references in subsequently loaded libraries.

How to handle a changed Library opened with dlopen()

Your question is unclear.

If you have some /tmp/plugin.so and you do

void* dl = dlopen("/tmp/plugin.so", TRL_NOW);

and later (in the same process) some

rename("/tmp/plugin.so", "/tmp/oldplugin.so")

(or even unlink("/tmp/plugin.so"); ...) you should be able to dlclose(dl);

However, if your build process is making a new one, e.g. you have some make /tmp/plugin.so target, then you really should do a

 mv /tmp/plugin.so /tmp/plugin.so~

or even

 rm /tmp/plugin.so

before linking the shared library, e.g. before

gcc -shared -Wall -O /tmp/plugin*.pic.o -o /tmp/plugin.so

In other words, be sure that your build procedure is not overwriting bytes in the same inode (of the original /tmp/plugin.so)

So if you overwrite your old /tmp/plugin.so with some mv /tmp/newplugin.so /tmp/plugin.so command in your build process you'll better do a mv /tmp/plugin.so /tmp/plugin.so~ or a rm /tmp/plugin.so just before.

Notice that mmap(2) (internally invoked by dlopen(3)) is actually working on opened inodes. See path_resolution(7). So you could unlink(2) your shared library while still having it dlopen-ed.

So never overwrite bytes in an existing shared library inode; do whatever is necessary to be sure to create a fresh shared library inode in your plugin build procedure.

Read Advanced Linux Programming & Drepper's How to Write a Shared Library

BTW, the real issue is not related to dlopen but to the nature of file descriptors (that is, of opened inodes) on POSIX systems (on which several processes can read and write the same file; the user or sysadmin -or tool developer- is supposed to avoid breaking havoc.).

Use also pmap(1) (as pmap 1234) and/or cat /proc/1234/maps to understand the memory mapping of process of pid 1234 (i.e. its virtual address space).

In practice, the user or sysadmin installing a plugin should ensure that a pristine inode is created for it, or that no process is using that plugin (before installation). It is his responsibility (and is a whole system issue). So you really need to educate your user or sysadmin, and document the issue, e.g. by suggesting the use of install(1) and/or locking utilities like package managers when installing plugins.

PS. Copying in a private copy the shared object before dlopen might improve the situation, but does not solve the issue (what if the shared object source gets updated during the copy?). The real bug is in the build process which overwrites a shared object instead of writing & creating a pristine new inode.