Parallel Processing from a Command Queue on Linux (Bash, Python, Ruby... Whatever)

Parallel processing from a command queue on Linux (bash, python, ruby... whatever)

I would imagine you could do this using make and the make -j xx command.

Perhaps a makefile like this

all : usera userb userc....

usera:
imapsync usera
userb:
imapsync userb
....

make -j 10 -f makefile

Queue using several processes to launch bash jobs

Use GNU Parallel to make a job queue like this:

# Clear out file containing job queue
> jobqueue

# Start GNU Parallel processing jobs from queue
# -k means "keep" output in order
# -j 4 means run 4 jobs at a time
tail -f jobqueue | parallel -k -j 4

# From another terminal, submit 40 jobs to the queue
for i in {1..40}; do echo "sleep 5;date +'%H:%M:%S Job $i'"; done >> jobqueue

Another option is to use REDIS - see my answer here Run several jobs parallelly and Efficiently

Doing Actions Simultaneously Versus In A Queue

Depends on how many worker processes you have.

Since you added the Heroku tag, I'm assuming you're using Heroku. On Heroku, one dyno is one such worker process.

Routing is more or less random on Heroku, but provided you have a large number of users (40 is probably not enough though), you should be able to serve as many users as you have dynos simultaneously.

How to speed up through-put of find . -type f -size +0 -exec ./work.sh {} \;

partition the job into chunks and run them using shell job control. Or install GNU parallel if this is going to be an everyday thing. job control example:

cnt=1
find . -type f -size +0 |
while read fname
do
zcat $fname | sed = | sed 'N;s/\n/";/' | grep -vE '"timepassed";' |
eval sed "$SED_ARG" >> $logfilename &
[ $(( $cnt % 10 )) -eq 0 ] && wait
cnt=$(( $cnt + 1 ))
done
wait

This runs ten jobs at a time. Change the 10 to suit your system, a higher number is NOT always a better choice.

$(( % )) is modulo - remainder - arithmetic. So when cnt is 10 20 30 ... $(( $cnt % 10 ))
returns zero. Everytime the value returns zero the script calls wait. The last wait statement (below the word done) is there incase tthe loop ends on a number than is not evenly divisible by 10, e.g. 52002. This is all part of bash.

Symfony2 Job Queue or Parallel Processing?

I would recomment Gearman server, it prooved quite stable, it's totally outside of Symfony2, and you have to have server up and running (don't know what your hosting options are), but it distribues jobs perfectly. In skiniest version, it just keeps all jobs in-memory, but you can configure it to use sqlite database as backup, so for any reason server reboots, or gearman deamon breaks, you can just start it again, and your jobs will be perserved. I konw it has been tested with very large loads (adding up 1k jobs per second), and it stood it's ground. It's probably more stable nowdays, I'm speaking from experience 2 yrs ago, where we offloaded some long-running tasks in ZF application to background processing via Gearman. It should be quite self-explanitory how it works from image below:

Sample Image

Python ( or maybe linux in general) file operation flow control or file lock

I know this is not a elegant way to coordinate the computation, but I
haven't figured out a better to communicate between different
machines.

While this isn't directly what you asked, you should really, really consider fixing your problem at this level, using some sort of shared message queue is likely to be a lot simpler to manage and debug than relying on the locking semantics of a particular networked filesystem.

The simplest solution to set up and run in my experience is redis on the machine currently running the Ruby script that creates the jobs. It should literally be as simple as downloading the source, compiling it and starting it up. Once the redis server is up and running, you change your code to append your the computation commands to one or more Redis lists. In ruby you would use the redis-rb library like this:

require "redis"

redis = Redis.new
# Your other code to build up command lists...
redis.lpush 'commands', command1, command2...

If the computations need to be handled by certain machines, use a list per-machine like this:

redis.lpush 'jobs:machine1', command1
# etc.

Then in your Python code, you can use redis-py to connect to the Redis server and pull jobs off the list like so:

from redis import Redis
r = Redis(host="hostname-of-machine-running-redis")
while r.llen('jobs:machine1'):
job = r.lpop('commands:machine1')
os.system('sh ' + job + ' &')

Of course, you could just as easily pull jobs off the queue and execute them in Ruby:

require 'redis'
redis = Redis.new(:host => 'hostname-of-machine-running-redis')
while redis.llen('jobs:machine1')
job = redis.lpop('commands:machine1')
`sh #{job} &`
end

With some more details about the needs of the computation and the environment it's running in, it would be possible to recommend even simpler approaches to managing it.

Symfony2 Job Queue or Parallel Processing?

I would recomment Gearman server, it prooved quite stable, it's totally outside of Symfony2, and you have to have server up and running (don't know what your hosting options are), but it distribues jobs perfectly. In skiniest version, it just keeps all jobs in-memory, but you can configure it to use sqlite database as backup, so for any reason server reboots, or gearman deamon breaks, you can just start it again, and your jobs will be perserved. I konw it has been tested with very large loads (adding up 1k jobs per second), and it stood it's ground. It's probably more stable nowdays, I'm speaking from experience 2 yrs ago, where we offloaded some long-running tasks in ZF application to background processing via Gearman. It should be quite self-explanitory how it works from image below:

Sample Image

Multi-threaded BASH programming - generalized method?

#adjust these as required
args_per_proc=1 #1 is fine for long running tasks
procs_in_parallel=4

xargs -n$args_per_proc -P$procs_in_parallel povray < list

Note the nproc command coming soon to coreutils will auto determine
the number of available processing units which can then be passed to -P

What is the best way of running shell commands from a web based interface?

So, I've tried to answer my own question with code as I couldn't find anything to quite fit the bill. Hopefully it's useful to anyone coming across the same problem.

Redbeard 0X0A pointed me in the general direction, I was able to get a stand along ruby script doing what I wanted using popen. Extending this to using EventMachine (as it provided a convenient way of writing a websocket server) and using it's inbuilt popen method solved my problem.

More details here http://morethanseven.net/2010/09/09/Script-running-web-interface-with-websockets.html and the code at http://github.com/garethr/bolt/



Related Topics



Leave a reply



Submit