Dynamic Source Code in C++

Dynamic source code in C++

C++ is a compiled language, and thus, there is no equivalent to "eval()".

For the specific example you mention, you could make a function registry that maps inputs (strings) to outputs (function pointers) and then call the resultant function.

There are several C++ interpreter libraries that are available, although performance will be very poor, they may accomplish what you want. Google search for "C++ interpreter". I've seen results for "Ch", "CINT" and "clipp"

Can you dynamically compile and link/load C code into a C program?

On POSIX systems (Linix, Mac, UNIX) you have the dlopen and dlsym functions you can work with. These functions can be used to load a shared library at run time and execute a function from it.

As far as creating a library, the simplest thing to do would be to write the relevant source code to a file, run gcc in a separate process to compile it, then use dlopen/dlsym to run the functions it contains.

For example:

#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>

const char *libsrc =
"#include <stdio.h>\n"
"\n"
"void f1()\n"
"{\n"
"  printf(\"in f1\\n\");\n"
"}\n"
"\n"
"int add(int a, int b)\n"
"{\n"
"  return a+b;\n"
"}\n";

int main()
{
    FILE *libfile = fopen("mylib.c", "w");
    fputs(libsrc, libfile);
    fclose(libfile);
    system("gcc -fPIC -shared -g -Wall -Wextra -o libmylib.so mylib.c");

    void *lib = dlopen("libmylib.so", RTLD_NOW);
    if (!lib) {
        printf("dlopen failed: %s\n", dlerror());
        return 1;
    }

    void (*f)() = dlsym(lib, "f1");
    if (f) {
        f();
    } else {
        printf("dlsym for f1 failed: %s\n", dlerror());
    }

    int (*a)(int, int) = dlsym(lib, "add");
    if (a) {
        int x = a(2,3);
        printf("x=%d\n", x);
    } else {
        printf("dlsym for add failed: %s\n", dlerror());
    }

    dlclose(lib);
    return 0;
}

Dynamic slicing in C/C++

A little information in addition to Rob's

the Wisconsin Program-Slicing Tool has evolved in a tool called CodeSurfer. Good news: it's commercially available and supported, and it works great for what it does. Bad news (perhaps): it does not actually produce a reduced program that computes the same value that you selected, but it's very convenient for navigating source code that you have not written.
Frama-C handles only C (no C++ for the foreseeable future). It is nice, not great, for navigating source code, but it can produce an equivalent smaller program for the criterion that you specify, if the original program is of the kind that it can analyze automatically (no recursion, no dynamic allocation). Frama-C is Open Source and has a mailing list in which your questions will be welcome if you are interested in the techniques it uses.

The reason CodeSurfer does not risk itself to produce an equivalent program and Frama-C can only do it for code with embedded-like restrictions is, in short, that doing so requires knowing the values of pointers, which can be arbitrarily difficult to compute with precision.

Why dynamic libraries source code should be compiled with position-independent code?

-fPIC is by no means the only solution to shared library problem. Prior to ELF Linux used an a.out executable format. In a.out all shared libraries have used unique addresses in global address space so they were always loaded to same fixed address by all processes. This proved extremely hard to manage: all distro packages had to agree between each other which address range is reserved for which library and to constantly revise this agreement as libraries evolved over time.

-fPIC got us out of this mess.

With your suggestion, global dynamic reservation of address ranges across all processes, once some process mapped a library in some memory area, no other process would be able to reuse this area even if it never actually loads the library. For 32-bit systems with 4G of address space (or even 2G is upper 2G are reserved for the kernel) that might quickly exhaust the VM. Another problem comes from the fact that size of main executable file is different across processes so there is no global start address from which libraries can be safely loaded.

How to dynamically load often re-generated c code quickly?

What you want to do is reasonable, and I am doing exactly that in MELT (a high level domain specific language to extend GCC; MELT is compiled to C, thru a translator itself written in MELT).

First, when generating C code (or many other source languages), a good advice is to keep some sort of abstract syntax tree (AST) in memory. So build first the entire AST of the generated C code, then emit it as C syntax. Don't think of your code generation framework without an explicit AST (in other words, generation of C code with a bunch of printf is a maintenance nightmare, you want to have some intermediate representation).

Second, the main reason to generate C code is to take advantage of a good optimizing compiler (another reason is the portability and ubiquity of C). If you don't care about performance of the generated code (and TCC compiles very quickly C into a very naive and slow machine code) you could use some other approaches, e.g. using some JIT libraries like Gnu lightning (very quick generation of slow machine code), Gnu Libjit or ASMJIT (generated machine code is a bit better), LLVM or GCCJIT (good machine code generated, but generation time comparable to a compiler).

So if you generate C code and want it to run quickly, the compilation time of the C code is not negligible (since you probably would fork a gcc -O -fPIC -shared command to make some shared object foo.so out of your generated foo.c). By experience, generating C code takes much less time than compiling it (with gcc -O). In MELT, the generation of C code is more than 10x faster than its compilation by GCC (and usually 30x faster). But the optimizations done by a C compiler are worth it.

Once you emitted your C code, forked its compilation into a .so shared object, you can dlopen it. Don't be shy, my manydl.c example demonstrates that on Linux you can dlopen a big lot of shared objects (many hundreds of thousands). The real bottleneck is the compilation of the generated C code. In practice, you don't really need to dlclose on Linux (unless you are coding a server program needing to run for months); an unused shared module can stay practically dlopen-ed and you mostly are leaking process address space (which is a cheap resource), since most of that unused .so would be swapped-out. dlopen is done quickly, what takes time is the compilation of a C source, because you really want the optimization to be done by the C compiler.

You coul use many other different approaches, e.g. have a bytecode interpreter and generate for that bytecode, use Common Lisp (e.g. SBCL on Linux which compiles dynamically to machine code), LuaJit, Java, MetaOcaml etc.

As others suggested, you don't care much about the time to write a C file, and it will stay in filesystem cache in practice (see also this). And writing it is much faster than compiling it, so staying in memory is not worth the trouble. Use some tmpfs if you are concerned by I/O times.

addenda

You asked

Can a library .so file on Linux be re-compiled and re- loaded at runtime?

Of course yes: you should fork a command to build the library from the generated C code (e.g. a gcc -O -fPIC -shared generated.c -o generated.so, but you could do it indirectly e.g. by running a make -j, especially if the generated.so is big enough to make it relevant to split the generated.c in several C generated files!) and then you dynamically load your library with dlopen (giving a full path like /some/file/path/to/generated.so, and probably the RTLD_NOW flag, to it) and you have to use dlsym to find relevant symbols inside. Don't think of re-loading (a second time) the same generated.so, better to emit a unique generated1.c (then generated2.c etc...) C file, then to compile it to a unique generated1.so (the second time to generated2.so, etc...) then to dlopen it (and this can be done many hundred thousands of times). You may want to have, in the emitted generated*.c files, some constructor functions which would be executed at dlopen time of the generated*.so

Your base application program should have defined a convention about the set of dlsym-ed names (usually functions) and how they are called. It should only directly call functions in your generated*.so thru dlsym-ed function pointers. In practice you would decide for example that each generated*.c defines a function void dynfoo(int) and int dynbar(int,int) and use dlsym with "dynfoo" and "dynbar" and call these thru function pointers (returned by dlsym). You should also define conventions of how and when these dynfoo and dynbar would be called. You'll better link your base application with -rdynamic so that your generated*.c files could call your application functions.

You don't want your generated*.so to re-define existing names. For instance, you don't want to redefine malloc in your generated*.c and expect all heap allocation functions to magically use your new variant (that probably won't work, and if even if it did, it would be dangerous).

You probably won't bother to dlclose a dynamically loaded shared object, except at application clean-up and exit time (but I don't bother at all to dlclose). If you do dlclose some dynamically loaded generated*.so file, be sure that nothing is used in it: no pointers, not even return addresses in call frames, are existing to it.

P.S. the MELT translator is currently 57KLOC of MELT code translated to nearly 1770KLOC of C code.

Dynamic Source Code in C++