How to Self Dlopen an Executable Binary

how to self dlopen an executable binary

You need to code:

  // file ds.c
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>

void hello ()
{
printf ("hello world\n");
}

int main (int argc, char **argv)
{
char *buf = "hello";
void *hndl = dlopen (NULL, RTLD_LAZY);
if (!hndl) { fprintf(stderr, "dlopen failed: %s\n", dlerror());
exit (EXIT_FAILURE); };
void (*fptr) (void) = dlsym (hndl, buf);
if (fptr != NULL)
fptr ();
else
fprintf(stderr, "dlsym %s failed: %s\n", buf, dlerror());
dlclose (hndl);
}

Read carefully dlopen(3), always check the success of the dlopen & dlsym functions there, and use dlerror on failure.

and compile the above ds.c file with

  gcc -std=c99 -Wall -rdynamic ds.c -o ds -ldl

Don't forget the -Wall to get all warnings and the -rdynamic flag (to be able to dlsym your own symbols which should go into the dynamic table).

On my Debian/Sid/x86-64 system (with gcc version 4.8.2, and libc6 version 2.17-93 providing the -ldl, kernel 3.11.6 compiled by me, binutils package 2.23.90 providing ld), the execution of ./ds gives the expected output:

  % ./ds 
hello world

and even:

  % ltrace ./ds
__libc_start_main(0x4009b3, 1, 0x7fff1d0088b8, 0x400a50, 0x400ae0 <unfinished ...>
dlopen(NULL, 1) = 0x7f1e06c9e1e8
dlsym(0x7f1e06c9e1e8, "hello") = 0x004009a0
puts("hello world"hello world
) = 12
dlclose(0x7f1e06c9e1e8) = 0
+++ exited (status 0) +++

Using dlopen() on an executable

You can't open executables as libraries. The entry point of an executable will attempt to re-initialize the C library, and take over the brk pointer. This will corrupt your malloc heap. Additionally, the executable is likely to be mapped at a fixed address with no relocations, and if this address overlaps with anything already loaded, it's not possible to map it for that reason as well.

You need to refactor the other program into a library, or add a RPC interface to the other program.

Note that this does not necessarily apply for PIE executables. However, unless the executable is specifically designed for being dlopen()ed, this is unsafe, as main() will not be run, and any initialization done in main() therefore will not occur.

How to use dlopen() to get the executables path

Am I wrong in my assumption that the handle for the executable can be used in the dlinfo functions the same way a .so handle can be used?

Yes, you are.

The dynamic linker has no idea which file the main executable was loaded from. That's because the kernel performs all mmaps for the main executable, and only passes a file descriptor to the dynamic loader (who's job it is to load other required libraries and star the executable running).

I'm trying to replicate some of the functionality of GetModuleFileName() on linux

There is no reliable way to do that. In fact the executable may no longer exist anywhere on disk at all -- it's perfectly fine to run the executable and remove the executable file while the program is still running.

Also hard links mean that there could be multiple correct answers -- if a.out and b.out are hard linked, there isn't an easy way to tell whether a.out or b.out was used to start the program running.

Your best options probably are reading /proc/self/exe, or parsing /proc/self/cmdline and/or /proc/self/maps.

shared object can't find symbols in main binary, C++

Try:

g++ -fPIC -rdynamic -o testexe testexe.cpp -ldl

Without the -rdynamic (or something equivalent, like -Wl,--export-dynamic), symbols from the application itself will not be available for dynamic linking.

Finding number of dlopen calls of an ELF binary in C

You could simply use ltrace:

Example:

#include <dlfcn.h>
#include <stdio.h>
int main(int C, char **V)
{
char **a = V+1;
while(*a){
void *h;
if(0==(h=dlopen(*a++, RTLD_LAZY)))
fprintf(stderr, "%s\n", dlerror());
}

}

Compile it:

$ gcc example.c -fpic -pie 

Invoke it on self and count dlopen calls:

$ ltrace -o /dev/fd/3 \ 
./a.out ./a.out ./a.out ./a.out 3>&1 >/dev/null| \
grep ^dlopen\( -c
3

How to compile ELF binary so that it can be loaded as dynamic library?

Based on links provided in comments and other answers here is how it can be done without linking these programs compile time:

test1.c:

#include <stdio.h>

int a(int b)
{
return b+1;
}

int c(int d)
{
return a(d)+1;
}

int main()
{
int b = a(3);
printf("Calling a(3) gave %d \n", b);
int d = c(3);
printf("Calling c(3) gave %d \n", d);
}

test2.c:

#include <dlfcn.h>
#include <stdio.h>

int (*a_ptr)(int b);
int (*c_ptr)(int d);

int main()
{
void* lib=dlopen("./test1",RTLD_LAZY);
a_ptr=dlsym(lib,"a");
c_ptr=dlsym(lib,"c");
int d = c_ptr(6);
int b = a_ptr(5);
printf("b is %d d is %d\n",b,d);
return 0;
}

Compilation:

$ gcc -fPIC  -pie -o test1 test1.c -Wl,-E
$ gcc -o test2 test2.c -ldl

Execution results:

$ ./test1
Calling a(3) gave 4
Calling c(3) gave 5
$ ./test2
b is 6 d is 8

References:

  • building a .so that is also an executable
  • Compile C program using dlopen and dlsym with -fPIC

PS: In order to avoid symbol clashes imported symbols and pointers they assigned to better have different names. See comments here.

Calling aarch64 shared library from amd64 executable, maybe using binary translation/QEMU

The solution that I implemented for this is to use shared memory IPC. This solution is particularly nice since it integrates pretty well with fixed-length C structs, allowing you to simply just use a struct on one end and the other end.

Let's say you have a function with a signature uint32_t so_lib_function_a(uint32_t c[2])

You can write a wrapper function in an amd64 library: uint32_t wrapped_so_lib_function_a(uint32_t c[2]).

Then, you create a shared memory structure:

typedef struct {
uint32_t c[2];
uint32_t ret;
int turn; // turn = 0 means amd64 library, turn = 1 means arm library
} ipc_call_struct;

Initialise a struct like this, and then run shmget(SOME_SHM_KEY, sizeof(ipc_call_struct), IPC_CREAT | 0777);, get the return value from that, and then get a pointer to the shared memory. Then copy the initialised struct into shared memory.

You then run shmget(3) and shmat(3) on the ARM binary side, getting a pointer to the shared memory as well. The ARM binary runs an infinite loop, waiting for its "turn." When turn is set to 1, the amd64 binary will block in a forever loop until the turn is 0. The ARM binary will execute the function, using the shared struct details as parameters and updating the shared memory struct with the return value. Then the ARM library will set the turn to 0 and block until turn is 1 again, which will allow the amd64 binary to do its thing until it's ready to call the ARM function again.

Here is an example (it might not compile yet, but it gives you a general idea):

Our "unknown" library : shared.h

#include <stdint.h>

#define MAGIC_NUMBER 0x44E

uint32_t so_lib_function_a(uint32_t c[2]) {
// Add args and multiplies by MAGIC_NUMBER
uint32_t ret;
for (int i = 0; i < 2; i++) {
ret += c[i];
}

ret *= MAGIC_NUMBER;
return ret;
}

Hooking into the "unknown" library: shared_executor.c

#include <dlfcn.h>
#include <sys/shm.h>
#include <stdint.h>

#define SHM_KEY 22828 // Some random SHM ID

uint32_t (*so_lib_function_a)(uint32_t c[2]);

typedef struct {
uint32_t c[2];
uint32_t ret;
int turn; // turn = 0 means amd64 library, turn = 1 means arm library
} ipc_call_struct;

int main() {
ipc_call_struct *handle;

void *lib_dlopen = dlopen("./shared.so", RTLD_LAZY);
so_lib_function_a = dlsym(lib_dlopen, "so_lib_function_a");

// setup shm

int shm_id = shmget(SHM_KEY, sizeof(ipc_call_struct), IPC_CREAT | 0777);
handle = shmat(shm_id, NULL, 0);

// We expect the handle to already be initialised by the time we get here, so we don't have to do anything

while (true) {
if (handle->turn == 1) { // our turn
handle->ret = so_lib_function_a(handle->c);
handle->turn = 0; // hand off for later
}
}
}

On the amd64 side: shm_shared.h

#include <stdint.h>
#include <sys/shm.h>

typedef struct {
uint32_t c[2];
uint32_t ret;
int turn; // turn = 0 means amd64 library, turn = 1 means arm library
} ipc_call_struct;

#define SHM_KEY 22828 // Some random SHM ID

static ipc_call_struct* handle;

void wrapper_init() {
// setup shm here
int shm_id = shmget(SHM_KEY, sizeof(ipc_call_struct), IPC_CREAT | 0777);
handle = shmat(shm_id, NULL, 0);

// Initialise the handle
// Currently, we don't want to call the ARM library, so the turn is still zero
ipc_call_struct temp_handle = { .c={0}, .ret=0, .turn=0 };
*handle = temp_handle;

// you should be able to fork the ARM binary using "qemu-arm-static" here
// (and add code for that if you'd like)
}

uint32_t wrapped_so_lib_function_a(uint32_t c[2]) {
handle->c = c;
handle->turn = 1; // hand off execution to the ARM librar
while (handle->turn != 0) {} // wait
return handle->ret;
}

Again, there's no guarantee this code even compiles (yet), but just a general idea.



Related Topics



Leave a reply



Submit