How to Avoid Race Condition When Using a Lock-File to Avoid Two Instances of a Script Running Simultaneously

How to avoid race condition when using a lock-file to avoid two instances of a script running simultaneously?

Yes, there is indeed a race condition in the sample script. You can use bash's noclobber option in order to get a failure in case of a race, when a different script sneaks in between the -f test and the touch.

The following is a sample code-snippet (inspired by this article) that illustrates the mechanism:

if (set -o noclobber; echo "$$" > "$lockfile") 2> /dev/null; 
then
   # This will cause the lock-file to be deleted in case of a
   # premature exit.
   trap 'rm -f "$lockfile"; exit $?' INT TERM EXIT

   # Critical Section: Here you'd place the code/commands you want
   # to be protected (i.e., not run in multiple processes at once).

   rm -f "$lockfile"
   trap - INT TERM EXIT
else
   echo "Failed to acquire lock-file: $lockfile." 
   echo "Held by process $(cat $lockfile)."
fi

Prevent race condition when creating lock file

That's not a race - it's resilience to failure. In situations where the script dies before it can remove the file, you need manual cleanup.

The usual way to try and automate this cleanup is to read the PID from any existing file, test to see if the process still exists, and essentially ignore its existence if it doesn't. Unfortunately without an atomic compare-and-set operation that's not trivial to do correctly, since it introduces a new race, between the read of the PID and someone else trying to ignore its existence.

Check out this question for more ideas around locking using just the file system.

My advice is to either store the lock file on a temporary filesystem (/var/run is usually tmpfs to permit pidfiles to disappear safely on reboot) so that things fix themselves after a reboot, or have the script throw up its hands and ask for manual intervention. Handling every failure case reliably increases complexity and thus probably introduces more probability of failure than asking a human for help.

And complexity isn't just today, it's for the lifetime of the code. It might be correct when you're done, but will the next person along break it?

How to avoid race conditions in a bash script?

You are already safely avoiding the actual race condition with the lock file. The problem you are describing can be avoided two ways.

(1) Move the lock file outside the main loop, so that two instances of your program cannot run their main loop at the same time. If one is running, the other will have to wait until it's done, then start replacing the output file.

#!/bin/bash

# FIXME: broken, see comments

while true; do
    if ! ln numbers numbers.lock
    then
       sleep 1
    else
        if [ ! -f numbers ]; then echo 0 > numbers; fi
        count=0
        touch numbers
        #echo $count > numbers   # needless, isn't it?
        while [[ $count != 100 ]]; do
            count=`expr $count + 1`
            n=`tail -1 numbers`
            expr $n + 1 >> numbers
            rm numbers.lock
        done
        break
    fi
done

(2) Make the two instances cooperate, by examining what the contents of the file are. In other words, force them to stop looping when the number reaches 100, regardless of how many other processes are writing to this file. (I guess there is an iffy corner case when there are more than 100 instances running.)

#!/bin/bash
# FIXME: should properly lock here, too
if [ ! -f numbers ]; then echo 0 > numbers; fi
n=0
touch numbers
while [[ $n -lt 100 ]]; do
  if ln numbers numbers.lock
  then
    n=$(expr $(tail -1 numbers) + 1 | tee numbers)
    rm numbers.lock
  fi
done

Depending on your requirements, you might actually want the script to clobber any previous value in the file when a new instance of the script is starting, but if not, the echo 0 > numbers should be governed by the lock file, too.

You really want to avoid expr in a Bash script; Bash has built-in arithmetic operators. I have not attempted to refactor that part here, but you probably should. Perhaps prefer Awk so you can factor out the tail too; awk '{ i=$0 } END { print 1+i }' numbers

How to prevent a race condition when multiple processes attempt to write to and then read from a file at the same time

The way to do this is to take an exclusive lock each time you open it. The writer holds the lock while writing data, while the reader blocks until the writer releases the lock with the fdclose call. This will of course fail if the file has been partially written and the writing process exits abnormally, so a suitable error to delete the file should be displayed if the module can't be loaded:

import os
import fcntl as F

def load_module():
    pyx_file = os.path.join(lib_dir, module_name + '.pyx')

    try:
        # Try and create/open the file only if it doesn't exist.
        fd = os.open(pyx_file, os.O_CREAT | os.O_EXCL | os.O_WRONLY):

        # Lock the file exclusively to notify other processes we're writing still.
        F.flock(fd, F.LOCK_EX)
        with os.fdopen(fd, 'w') as f:
            f.write(code)

    except OSError as e:
        # If the error wasn't EEXIST we should raise it.
        if e.errno != errno.EEXIST:
            raise

    # The file existed, so let's open it for reading and then try and
    # lock it. This will block on the LOCK_EX above if it's held by
    # the writing process.
    with file(pyx_file, "r") as f:
        F.flock(f, F.LOCK_EX)

    return imp.load_dynamic(module_name, module_path)

module = load_module()

What is the best way to ensure only one instance of a Bash script is running?

If the script is the same across all users, you can use a lockfile approach. If you acquire the lock, proceed else show a message and exit.

As an example:

[Terminal #1] $ lockfile -r 0 /tmp/the.lock
[Terminal #1] $ 

[Terminal #2] $ lockfile -r 0 /tmp/the.lock
[Terminal #2] lockfile: Sorry, giving up on "/tmp/the.lock"

[Terminal #1] $ rm -f /tmp/the.lock
[Terminal #1] $ 

[Terminal #2] $ lockfile -r 0 /tmp/the.lock
[Terminal #2] $

After /tmp/the.lock has been acquired your script will be the only one with access to execution. When you are done, just remove the lock. In script form this might look like:

#!/bin/bash

lockfile -r 0 /tmp/the.lock || exit 1

# Do stuff here

rm -f /tmp/the.lock

Quick-and-dirty way to ensure only one instance of a shell script is running at a time

Here's an implementation that uses a lockfile and echoes a PID into it. This serves as a protection if the process is killed before removing the pidfile:

LOCKFILE=/tmp/lock.txt
if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
    echo "already running"
    exit
fi

# make sure the lockfile is removed when we exit and then claim it
trap "rm -f ${LOCKFILE}; exit" INT TERM EXIT
echo $$ > ${LOCKFILE}

# do stuff
sleep 1000

rm -f ${LOCKFILE}

The trick here is the kill -0 which doesn't deliver any signal but just checks if a process with the given PID exists. Also the call to trap will ensure that the lockfile is removed even when your process is killed (except kill -9).

backup script requires a pid lock to prevent multiple instances

Add the following to your script:

trap "rm -f /tmp/lockfile && exit" SIGINT SIGTERM #Put this on the top to handle CTRL+C or SIGTERM
test -f /tmp/lockfile && exit #Before rsync to ensure that the script will not run if there is another one running

touch /tmp/lockfile #Before the rsync

rm -f /tmp/lockfile #After the rsync

rename the lock file path/name according to your needs, you can also name it with the current PID using $$ variable.

How can I have per-worker synchronization to avoid race conditions on methods connected to signals on Celery?

We have implemented a Celery autoscaling (on AWS) by using few Celery features that come out-of-box. For what you ask we use Celery's control API (https://docs.celeryproject.org/en/latest/reference/celery.app.control.html). The key is the Inspect part of it. The Inspect class can take destination parameter, which is the Celery node you want to inspect. We do not use it, we want to inspect all nodes in the cluster, but perhaps you may need to do it differently. You should get yourself familiar with this class and its .active() method, which will give you a list of active tasks either in set of workers or in the whole cluster (if destination is not provided).

How to Avoid Race Condition When Using a Lock-File to Avoid Two Instances of a Script Running Simultaneously