How Can a C++ Binary Replace Itself

How can a C++ binary replace itself?

  1. Move/Rename your running app.exe to app_old.exe
  2. Move/Rename your downloaded update.exe to app.exe
  3. With the next start of your application the update will be used

Renaming of a running i.e. locked dll/exe is not a problem under windows.

How to write a a self replacing/updating binary?

Rename the currently running binary to something else, write the new binary, run it, then delete the renamed binary later.

Changing a number defined in a C++(C) program without compiling the source again

In general, you cannot change anything without re-compiling.

In practice and in very limited cases, you might patch your binary. This is mostly processor specific (and executable format specific and ABI specific) and depends less on your particular operating system version (e.g. if it works for Windows 9, it could work for Windows 10).

(However, I don't know and never used Windows; I'm only using Linux; you should adapt my answer to your operating system)

So in some cases you might reverse-engineer your binary executable. If you do have the C source code, you could ask your compiler to emit the assembler code (e.g. by compiling with gcc -O -fverbose-asm -S with GCC). Then you might disassemble your executable, and change, with a binary or hexadecimal editor, the machine code containing that constant.

This won't always work, because the machine instruction (and its size) could depend on the magnitude (bit size) of your constant.

To take a simple example, in C, for GCC 7, on Linux/x86-64, consider the following C file:

 /// A, B, C are preprocessor symbols defined as integers
int f(int x) {
if (x > 0)
return A*x + B;
return C;
}

If I compile that with gcc -fverbose-asm -S -O -DA=12751 -DB=32 -DC=11 e.c I'm getting:

    .type   f, @function
f:
.LFB0:
.cfi_startproc
# e.c:3: if (x > 0)
testl %edi, %edi # x
jle .L3 #,
# e.c:4: return A * x + B;
imull $12751, %edi, %edi #, x, tmp90
leal 32(%rdi), %eax #, <retval>
ret
.L3:
# e.c:5: return C;
movl $11, %eax #, <retval>
# e.c:6: }
ret
.cfi_endproc
.LFE0:
.size f, .-f

But if I do gcc -S -O -fverbose-asm -DA=12753 -DB=32 -DC=10 e.c I'm getting

    .type   f, @function
f:
.LFB0:
.cfi_startproc
# e.c:3: if (x > 0)
testl %edi, %edi # x
jle .L3 #,
# e.c:4: return A * x + B;
imull $12753, %edi, %edi #, x, tmp90
leal 32(%rdi), %eax #, <retval>
ret
.L3:
# e.c:5: return C;
movl $10, %eax #, <retval>
# e.c:6: }
ret

So indeed, in the above case I could patch the binary (I would need to find the 12751 and 11 constants in machine code; it is doable but tedious in that case).


Now, let's try with A being a small power of two, like 16, and C being 0, so
gcc -S -O -fverbose-asm -DA=16 -DB=32 -DC=0 e.c:

f:
.LFB0:
.cfi_startproc
# e.c:4: return A * x + B;
leal 2(%rdi), %eax #, tmp90
sall $4, %eax #, tmp93
testl %edi, %edi # x
movl $0, %edx #, tmp92
cmovle %edx, %eax # tmp93,, tmp92, <retval>
# e.c:6: }
ret

Because of compiler optimizations, the code changed significantly. It is not easy to patch.

Important notice

With enough effort, money and time (think of NSA-like abilities) a lot of things are possible.

if your goal is to obfuscate some data in your binary (e.g. some password), you might encrypt it to make hackers' life harder (but don't be naive, the NSA will be able to get it). Remember the motto: there is No Silver Bullet; it looks that is your goal, but don't be too naive (BTW, the legal protections around your software, e.g. the license, matters even more; so you need a lawyer to write a good EULA).

If your goal is on the contrary to adapt some performance-critical code, you could use metaprogramming and partial evaluation techniques. A practice I like doing is generate at runtime some temporary C (or C++) code (better suited for your particular situation and data), compile that temporary C or C++ code as some plugin, then dynamically load that temporary plugin (using dlopen and dlsym on Linux; on Windows you'll need LoadLibrary but I leave you to understand the details and consequences). Instead of generating C or C++ code at runtime you could use some JIT compiling library like libgccjit. If you are fond of such techniques, consider instead using better programming languages (like Common Lisp with SBCL) if your management allows them.

But I don't want to compile my program 1000 times for 1000 customers

That surprises me a lot. Compiling a simple (short) C file containing just constants is quick, and linking time is also quick. I would instead consider recompilation for each customer.

BTW, I feel you are incredibly naive. The most important protection is not technical in your binary, it is a legal protection (and you need a good contract, so find and pay a good lawyer).

Did you consider on the contrary to make your product free software? Many companies are doing that (and making money on something else that licenses, e.g. support).


NB. there are lots of existing license managers. Did you consider buying and using one? Notice also that corporations have large incentives to avoid cheating, and those willing to steal your software will be able to do that anyway. You'll sell more products by working on software quality, not by spending efforts on vain "protection" measures which are annoying your customers, increasing your logistics and distribution and maintenance costs, and harden the debugging of customer-found bugs.

Can a C program modify its executable file?

On Windows, when a program is run the entire *.exe file is mapped into memory using the memory-mapped-file functions in Windows. This means that the file isn't necessarily all loaded at once, but instead the pages of the file are loaded on-demand as they are accessed.

When the file is mapped in this way, another application (including itself) can't write to the same file to change it while it's running. (Also, on Windows the running executable can't be renamed either, but it can on Linux and other Unix systems with inode-based filesystems).

It is possible to change the bits mapped into memory, but if you do this the OS does it using "copy-on-write" semantics, which means that the underlying file isn't changed on disk, but a copy of the page(s) in memory is made with your modifications. Before being allowed to do this though, you usually have to fiddle with protection bits on the memory in question (e.g. VirtualProtect).

At one time, it used to be common for low-level assembly programs that were in very constrained memory environments to use self-modifying code. However, nobody does this anymore because we're not running in the same constrained environments, and modern processors have long pipelines that get very upset if you start changing code from underneath them.

Path to binary in C

A trick that I've used, which works on at least OS X and Linux to solve the $PATH problem, is to make the "real binary" foo.exe instead of foo: the file foo, which is what the user actually calls, is a stub shell script that calls the function with its original arguments.

#!/bin/sh

$0.exe "$@"

The redirection through a shell script means that the real program gets an argv[0] that's actually useful instead of one that may live in the $PATH. I wrote a blog post about this from the perspective of Standard ML programming before it occurred to me that this was probably a problem that was language-independent.

Annotating a C++ binary with bit of information

You can use libelf, ELFsh, or other ELF tools to create your own "section" in the binary and put whatever you want in it. This question has some more links. If all you want to do is add a blob of data to a binary, it might be easier to just use objcopy --add-section like answered here.

How do I get the directory that a program is running from?

Here's code to get the full path to the executing app:

Variable declarations:

char pBuf[256];
size_t len = sizeof(pBuf);

Windows:

int bytes = GetModuleFileName(NULL, pBuf, len);
return bytes ? bytes : -1;

Linux:

int bytes = MIN(readlink("/proc/self/exe", pBuf, len), len - 1);
if(bytes >= 0)
pBuf[bytes] = '\0';
return bytes;


Related Topics



Leave a reply



Submit