How to Run Processes Piped with Bash on Multiple Cores

How to run processes piped with bash on multiple cores?

Suppose dostuff is running on one CPU. It writes data into a pipe, and that data will be in cache on that CPU. Because filterstuff is reading from that pipe, the scheduler decides to run it on the same CPU, so that its input data is already in cache.

If your kernel is built with CONFIG_SCHED_DEBUG=y,

# echo NO_SYNC_WAKEUPS > /sys/kernel/debug/sched_features

should disable this class of heuristics. (See /usr/src/linux/kernel/sched_features.h and /proc/sys/kernel/sched_* for other scheduler tunables.)

If that helps, and the problem still happens with a newer kernel, and it's really faster to run on separate CPUs than one CPU, please report the problem to the Linux Kernel Mailing List so that they can adjust their heuristics.

Bash: Running the same program over multiple cores

You could use

for f in *.fa; do
    myProgram (options) "./$f" "./$f.tmp" &
done
wait

which would start all of you jobs in parallel, then wait until they all complete before moving on. In the case where you have more jobs than cores, you would start all of them and let your OS scheduler worry about swapping processes in an out.

One modification is to start 10 jobs at a time

count=0
for f in *.fa; do
    myProgram (options) "./$f" "./$f.tmp" &
    (( count ++ ))        
    if (( count = 10 )); then
        wait
        count=0
    fi
done

but this is inferior to using parallel because you can't start new jobs as old ones finish, and you also can't detect if an older job finished before you manage to start 10 jobs. wait allows you to wait on a single particular process or all background processes, but doesn't let you know when any one of an arbitrary set of background processes complete.

How to assign cpu cores to multiple instances of a program with arguments using tasket

I often hit similar problems. Instead of using taskset I use nice:

parallel --nice 11 ./models_test tfidf.db output/ input/ {} '{= $_+=9 =}' ::: {1..60..10} &
./models_test tfidf.db output/ input/ 61 70
wait

Run a script on multi cores and parallel processing

The really critical question when looking at parallel code is dependencies. I'm going to assume that - because your script can be subdivided - you're not doing anything complicated inside the loop.

But because you're stepping by 0.001 and 5 loops deep you're just doing a LOT of iterations if you were to go from 0 to 1. 100,000,000,000,000 of them, to be precise.

To parallelise, I would personally suggest you 'unroll' the outer loop and use Parallel::ForkManager.

E.g.

my $CPU_count = 256;

my $fork_manager = Parallel::ForkManager->new($CPU_count);

for ( my $k1 = $start; $k1 < $end; $k1 += 0.001 ) {
    # Run outer loop in parallel
    my $pid = $fork_manager->start and next;

    for ( my $k2 = $start; $k2 < $end; $k2 += 0.01 ) {
        for ( my $k3 = $start; $k3 < $end; $k3 += 0.001 ) {
            for ( my $k4 = $start; $k4 < $end; $k4 += 0.001 ) {
                for ( my $k5 = $start; $k5 < $end; $k5 += 0.001 ) {
                    ...;
                }
            }
        }
    }

    $fork_manager->end;
}

What this will do is - for each iteration of that 'outer' loop, fork your process and run the 4 inner loops as a separate process. It'll cap at 256 concurrent processes. You should match this to the number of CPUs you have available.

Bear in mind though - this only really works for trivial 'cpu intensive' tasks. If you're doing much disk IO or trying to share memory this won't work nearly as well.

Also note - if the number of steps on the outer loop is fewer than the number of CPUs it won't parallelise quite so well.

I'd also note - $k2 has a smaller iterator. I've copied that from your source, but it may be a typo.

Parallelize Bash script with maximum number of processes

Depending on what you want to do xargs also can help (here: converting documents with pdf2ps):

cpus=$( ls -d /sys/devices/system/cpu/cpu[[:digit:]]* | wc -w )

find . -name \*.pdf | xargs --max-args=1 --max-procs=$cpus  pdf2ps

From the docs:

--max-procs=max-procs
-P max-procs
       Run up to max-procs processes at a time; the default is 1.
       If max-procs is 0, xargs will run as many processes as  possible  at  a
       time.  Use the -n option with -P; otherwise chances are that only one
       exec will be done.

Assigning a cpu core to a process - Linux

There are two ways of assigning cpu core/cores to a running process.

First method:

taskset -cp 0,4 9030

Pretty clear ! assigning cpu cores 0 and 4 to the pid 9030.

Second Method:

taskset -p 0x11 9030

This is a bit more complex. The hexadecimal number that follows -p is a bitmask. An explanation can be found here, an excerpt of which is given below :

The CPU affinity is represented as a bitmask, with the lowest order
bit corresponding to the first logical CPU and the highest order bit
corresponding to the last logical CPU. Not all CPUs may exist on a
given system but a mask may specify more CPUs than are present. A
retrieved mask will reflect only the bits that correspond to CPUs
physically on the system. If an invalid mask is given (i.e., one that
corresponds to no valid CPUs on the current system) an error is
returned. The masks are typically given in hexadecimal.

Still confused? Look at the image below :

Sample Image

I have added the binaries corresponding to the hexadecimal number and the processors are counted from left starting from zero. In the first example there is a one in the bitmask corresponding to the zeroth processor, so that processor will be enabled for a process. All the processors which have zero to their corresponding position in the bitmask will be disabled. In fact this is the reason why it is called a mask.

Having said all these, using taskset to change the processor affinity requires that :

A user must possess CAP_SYS_NICE to change the CPU affinity of a
process. Any user can retrieve the affinity mask.

Please check the Capabalities Man Page.

You might be interested to look at this SO Question that deals with CAP_SYS_NICE.

My Resources

Tutorials
Point
XModulo

How to program so that different processes run on different CPU cores?

If you wish to pin threads/processes to specific CPUs then you have to use the sched_setaffinity(2) system call or the pthread_setaffinity_np(3) library call for that. Each core in Linux has it's own virtual CPU ID.

These calls allow you to set the allowed CPU mask.

Otherwise it will be up to the digression of the scheduler to run your threads where it feels like running them.

Neither will guarantee that your process runs in parallel though. This is something only the scheduler can decide unless you run realtime.

Here is some sample code:

#include <sched.h>

int run_on_cpu(int cpu) {

    cpu_set_t allcpus;

    CPU_ZERO(&allcpus);
    sched_getaffinity(0, sizeof(cpu_set_t), &allcpus);

    int num_cpus = CPU_COUNT(&allcpus);

    fprintf(stderr, "%d cpus available for scheduling\nAvailable CPUs: ", num_cpus);

    size_t i;
    for (i = 0; i < CPU_SETSIZE; i++) {
        if (CPU_ISSET(i, &allcpus))
            fprintf(stderr, "%zu, ", i);
    }
    fprintf(stderr, "\n");

    if (CPU_ISSET(cpu, &allcpus)) {
        cpu_set_t cpu_set;
        CPU_ZERO(&cpu_set);
        CPU_SET(cpu, &cpu_set);
        return pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpu_set);
    }

    return -1;
}

Bash linux: start multiple program in parallel and stop all when one has finished

Add:

trap 'kill $(jobs -p)' EXIT

to the beginning of your script. This will kill all background jobs when your script exits.

To create a script open a new file and paste the following into it:

#!/bin/bash
trap 'kill $(jobs -p)' EXIT
taskset -c 1 prog0 -option0 &
sleep 3
taskset -c 0 pidstat 1 -C prog0 -u > log2 &
taskset -c 0 pidstat 1 -C prog0 -r > log3 &
taskset -c 0 prog1 -option1 > log1

Save the file as runme.sh.

Make it executable: chmod +x runme.sh

Run it by executing: ./runme.sh or to run it in the background: ./runme.sh &

Now, when the last command taskset -c 0 prog1 -option1 > log1 has finished, the script will exit and it will kill all the background processes that it started.

How to Run Processes Piped with Bash on Multiple Cores