Replacing a Running Executable in Linux

How to detach executable file from process in linux for live update

Normally an executable file cannot be overwritten while a process is started from this file. Is there a possibility to circumvent/break this lock?

Yes. Do not overwrite the executable; replace it.

That is, you save the new executable under a temporary name in the same directory (or anywhere in the same file system -- must be on the same mount!), then either rename() or link() the temporary file over the executable.

In a shell script, you can use mv -f newbinary oldbinary, if both newbinary and oldbinary are in the same file system and mount. In a Bash script, you might use something like

#!/bin/bash
BINDIR=/usr/bin

# Autoremoved work directory
Work="$(mktemp -d)" || exit 1
trap "cd / ; rm -rf '$Work'" EXIT

# ... Check if new binaries available ...
#     Otherwise: exit 0

# ... Download new binaries under "$Work/" ...

# Copy 'executable' to $BINDIR, under a temporary name
tempbin="executable.$PID-$RANDOM$RANDOM$RANDOM"
if ! mv -f "$Work/executable" "$BINDIR/$tempbin" ; then
    # Failed
    exit 1
elif ! mv -f "$BINDIR/$tempbin" "$BINDIR/executable" ; then
    # Failed
    exit 1
fi

# Successfully replced.
exit 0

This works on all POSIXy systems, because file name is completely separate of the inode that specifies its contents, access mode, ownership, timestamps, and so on.

In practice, the kernel will retain the old inode for as long as there are executables running it, or any process has it open. However, the file name will immediately point to the new inode, with the new executable contents. So, essentially, the rename/link simply changes which inode the file name refers to. That is also why the temporary file must reside on the same filesystem (same mount).

The goal is the process (a long running task) should update itself with as little downtime as possible.

It is a common security hole to allow a process to change itself. Typically, it is not even allowed at all in POSIXy systems, unless the process is run with superuser privileges (i.e., as root or in Linux, with CAP_FOWNER capability). You do NOT want to do this.

(Just because it is common to do so, for example with PHP web stuff, does not make it sane or safe. If it did, then we'd have to agree that excrement tastes good, because there are billions of flies and dung beetles that think so. If you take a look, you'll find that such web services ALL have had severe security problems, some directly related to this update mechanism. Some maintainers of said package claim that problems during updates, like man-in-the-middle attacks, are the users' fault, not theirs, though. They're wrong, of course.)

Instead, you should have a separate, privileged service that periodically checks for updates, and when found, retrieves the new version using the above replacement method. In the simplest case, this can simply be run from cron or similar.

If your users really want you to, you can create a minimal C daemon that periodically checks if a new version is available. You can have it receive on a specific Unix domain datagram address, so that your executable can send a single character to it (no matter which user it is run as) for the update daemon to do a check then and there (unless it has checked recently enough). Essentially, it'll just wait (say, using select()) for enough time to elapse, or a specific request to check. When it is time, it'll run a shell script to check if a new executable is available (say, using popen() etc.; the typical location to save such scripts is in /usr/lib/yourservice/). If the script responds that a new version is available, run another script to download and replace the binary. If the process receives a SIGHUP signal, do the check immediately; if it receives a SIGTERM signal, exit. That way it can be run as a service, and won't consume much resources when running.

In your long-running executable, if it is at a point where it can replace itself with a newer version, use stat() on /proc/self/exe and argv[0], to verify if they have the same st_dev and st_ino. If they do not, then the update service has provided a newer version of the executable, and your service can run

    if (argv[0][0] == '/')
        execv(argv[0], argv);
    else
        execvp(argv[0], argv);

or, if you define the absolute path to your executable at compile time in say exepath, then

    execvp(exepath, argv);

to replace itself with the newer version.

Do note that such a process should close all open file descriptors (except for standard streams; 0, 1, and 2, or STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO), when it starts up. (That is, close all open file descriptors between 3 and sysconf(_SC_OPEN_MAX), inclusive.) This is because exec*() functions do not close file descriptors (other than those marked O_CLOEXEC/FD_CLOEXEC), so any descriptors that might be open at time of exec will be left open. Doing it this way also means that if exec fails, your service can continue running normally.

Replace a running executable on mac osx

Assuming you're implementing an update feature, checkout the Sparkle framework,

which does exactly what you're looking for, and way more.

In case you only need to replace a running application,

browse the Sparkle project at GitHub to see how it's done.

Strategies For Replacing Program Executable in Windows

Running exe downloads the new one, puts it somewhere
Running exe renames itself to anything (like .exe.tmp)
Running exe puts the downloaded exe where the running one is (named just like the original)
Running exe starts the downloaded exe
Downloaded exe checks for .exe.tmp file, if found deletes it and kills the old running process
Done

When I replace a .so file used by a running PostgreSQL server it crashes

What's the difference?

make install typically uses the install command, which deletes then re-creates the file. (This isn't inherently true, it's just how most people's Makefiles are written).

By contrast, cp just overwrites it in place.

Why does it matter?

UNIX/Linux systems mmap binaries into memory when they run. This means that the file contents are essentially directly part of the program memory. So when you overwrite the contents of an executable binary file while the program is running, things are likely to go boom in exciting ways.

By contrast, if you delete (unlink) the file, it remains mapped as an anonymous file. It still exists until the last handle to it is closed, it just doesn't have a file name anymore. The new file is then created with the same file name, but without affecting the contents of the now inaccessible unlinked file. So nothing crashes - so long as you don't have multiple instances of programs that expect to see the same version of a shared library, at least.

How to do it right?

If you insist on replacing binaries of running executables, you should do so using the install command, or with rm /the/path && cp /new/file /the/path followed by any required chown and chmod.

Demo showing the difference

Setup:

$ echo "whatevs" > blah
$ touch /tmp/blah

Using install:

strace -e unlink,open,write install blah /tmp/blah
...
unlink("/tmp/blah")                     = 0
open("blah", O_RDONLY)                  = 3
open("/tmp/blah", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
write(4, "whatevs\n", 8)                = 8
...

vs cp:

$ strace -e unlink,open,write cp blah /tmp/blah
...
open("blah", O_RDONLY)                  = 3
open("/tmp/blah", O_WRONLY|O_TRUNC)     = 4
write(4, "whatevs\n", 8)                = 8
...

Note how install unlinks the old file first? Crucial, that.

Another difference is that install won't change the contents of other links to the same underlying file, if a file has multiple hardlinks. cp will.

Is it safe to overwrite a .so file or an executable in use using rsync?

Short answer

It's perfectly safe within a single file if you don't use --in-place.

It's mostly safe for multiple interdependent files, but has some risks which using --delay-updates will minimize.

Long answer

By default (that is, when not using --in-place), rsync will actually create contents in a new file, named with a temporary name (something like .__your_file), and then rename it over the original file when complete.

This rename is a completely atomic operation: Anything trying to open the file will either get the original file, or the replacement (after that replacement is entirely complete).

Moreover, if the original is in use, then its reference count will be nonzero even after the directory entry pointing to it is overwritten with the new entry pointing to the different inode, so the content will remain on-disk (undeleted) until the original file is no longer open.

However, with multiple files, you run a risk that only some of those files will be atomically replaced. If you're copying over both a new foo and a libfoo.so such that the old foo won't work with the new libfoo.so and the new foo won't work with the old libfoo.so, you're in a bad situation if you're trying to start an executable after the new libfoo.so has been rename()'d into place but foo hasn't yet.

The nearest thing to a fix for this that rsync has available is the --delay-updates option, which will wait until it has both .__foo and .__libfoo.so complete and then rename them both next to each other. There's still no operating-system-level guarantee that you can't see an updated version of one file and not the other, but the time window in which this can occur is made substantially smaller.

If using --in-place, then the operating system will deny write permissions due to the file being in-use (not enforced for all access on UNIX, but specifically enforced with mmap(MAP_PRIVATE), as used for executables and shared libraries); this would be a "Text file busy" error. If your operating system did not enforce this, any scenario where mmap() were used to provide memory regions reflecting file contents (which is typically how shared libraries are loaded) would cause Bad Things to happen in the event of an in-place overwrite.

What happens when you overwrite a memory-mapped executable?

Under Linux, if you replace an executable while it is running, the results are unpredictable and it may crash. Pages which have been modified (e.g. "bss" initialised data) won't be affected, but pages which haven't been modified (e.g. most code) will.

My guess is that in your case, the string was in a part which was a modified (copied) page so wasn't affected.

However, all that only happens if you actually overwrite the same file.

Most of the time, when you replace an executable, you'll be replacing the directory entry with a different file. This is typically done by renaming a temporary file (in the same directory) over the existing one. This is what (for example) package managers do.

In the replacing-directory-entry case, the previous executable file continues to exist as a totally separate (still executing) file, and the previous executable can have its pages discarded and reloaded without a problem - it still sees the old file.

Quite what the linker does with its output, I don't know. But /usr/bin/install creates a new file. I expect this behaviour is quite deliberate.