Installing Gnu Parallel Without Root Permission

How to install packages in Linux (CentOS) without root user with automatic dependency handling?

It is possible to use yum and rpm to install any package in the repository of the distribution. Here is the recipe:

Find the package name

Use yum search.

Download

Download the package and all of its dependencies using yumdownloader (which is available on CentOS by default). You'll need to pass it --resolve to get dependency resolution. yumdownloader downloads to the current directory unless you specify a --destdir.

mkdir -p ~/rpm
yumdownloader --destdir ~/rpm --resolve vim-common

Choose a prefix location

It might be ~, ~/centos, or ~/y. If your home is slow because it is on a network file system, you can put it in /var/tmp/....

mkdir ~/centos

Extract all .rpm packages

Extract all .rpm packages to your chosen prefix location.

cd ~/centos && rpm2cpio ~/rpm/x.rpm | cpio -id

rpm2cpio outputs the .rpm file as a .cpio archive on stdout.
cpio reads it from from stdin
-i means extract (to the current directory)
-d means create missing directory

You can optionally use -v: verbose

Configure the environment

You will need to configure the environment variable PATH and LD_LIBRARY_PATH for the installed packages to work correctly. Here is the corresponding sample from my ~/.bashrc:

export PATH="$HOME/centos/usr/sbin:$HOME/centos/usr/bin:$HOME/centos/bin:$PATH"

export MANPATH="$HOME/centos/usr/share/man:$MANPATH"

L='/lib:/lib64:/usr/lib:/usr/lib64'
export LD_LIBRARY_PATH="$HOME/centos/usr/lib:$HOME/centos/usr/lib64:$L"

Edited note (thanks to @AmitNaidu for pointing out my mistake):

According to bash documentation about startup files, when connecting to a server via ssh, only .bashrc is sourced:

Invoked by remote shell daemon
Bash attempts to determine when it is being run with its standard input connected to a network connection, as when executed by the remote shell daemon, usually rshd, or the secure shell daemon sshd. If Bash determines it is being run in this fashion, it reads and executes commands from ~/.bashrc, if that file exists and is readable.

Now if you want to install a lot of packages that way, you might want to automate the process. If so, have a look at this repository.

Extra note: if you are trying to install any of gcc, zlib, make, cmake, git, fish, zsh or tmux , you should really consider using conda, see my other answer.

How to run several python files using bash script iteratively using only 7 CPUS?

You can use GNU Parallel:

parallel -j 7 python3 script.py ::: $(seq 1.5 0.05 2.0)

EDIT according to Mark Setchell's comment

GNU Parallel not returning output values across remote hosts

In the example you link to, see how the backquotes are backslashed? You need to do that or else hostname gets executed in your shell on HW04, before it talks to other machines.

First off, I'd try this to see whether you are talking to those other machines at all:

parallel -j 5 \
    --sshloginfile ./parallel-nodes.txt \
    echo "Number {}: Running on \`hostname\`" ::: 1 2 3 4 5 6 7 8 9 10

Then, I'd try tracking down your passwordless ssh setup one machine at a time to make sure it's really working; from HW04 try:

parallel -S HW01 'echo -n {} ""; hostname' ::: 1
parallel -S HW02 'echo -n {} ""; hostname' ::: 1
parallel -S HW03 'echo -n {} ""; hostname' ::: 1
parallel -S HW04 'echo -n {} ""; hostname' ::: 1

(repeat for every machine in your parallel-nodes.txt file)

If one of those machines isn't working with ssh, you can try to debug it with:

PARALLEL_SSH='ssh -v' parallel -S HW03 'echo -n {} ""; hostname' ::: 1

How do I get GNU parallel to work on git bash in Windows 7?

So the root cause is that CygWin (contrary to GNU/Linux) does not respect redirection of STDERR if the command line is too long.

GNU Parallel figures out how long the longest possible command line is by doing a binary search for the length. This is awfully slow on CygWin because forking a 12 MB command line is horribly slow (and 12 MB seems to be the limit in my version of CygWin).

Luckily it only has do be done once. After this GNU Parallel caches the line length in ~/.parallel/tmp/HOSTNAME/linelen, and this is the reason why you experience the problem when ~/.parallel/tmp is removed.

This is also the reason why it seemed that using a different version worked: You simply had a single run that finished, and thus cached the length. It was not the change of version that did this.

Until I manage to get CygWin to ignore the sh: -c: option requires an argument all you need to do is to ignore it and be patient. I should probably also put in a small warning, to let CygWin users know that they have to be patient the first time.

Run:

parallel echo ::: 1

It will spit out the sh: -c: option requires an argument around 25 times, but that is fine. It will take around 30 seconds to complete.

After this everything should be fast(er) and you should not see the error.

It should be fixed in the newest version in GIT: https://savannah.gnu.org/git/?group=parallel

As I make irssi, not being able to login as root - how can irssi be run?

Change the installation path to somewhere you have write access to. Try the following:

./configure --prefix=$HOME
make
make install

That should install it to ~/bin/

How to use GNU parallel to run a list of commands where 4 commands run simultaneously

If you have GNU Parallel you can do this:

parallel -j4 scrapy crawl urlMonitor -a slice={} ::: {1..38}

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Using GNU parallel command with gfind to gain in runtime for gupdatedb tool

You don't need ::: if there's nothing after it, and {} is pointless too if you don't have any sources. Without more information about what exactly you would want to parallelize, we can't really tell you what you should use instead.

But for example, if you want to run one find in each of /etc, /usr, /bin, and /opt, that would look like

parallel find {} -options ::: /etc /usr /bin /opt

This could equivalently be expressed without ::::

printf '%s\n' /etc /usr /bin /opt |
parallel find {} -options

So the purpose of ::: is basically to say "I want to specify the things to parallelize over on the command line instead of receiving them on standard input"; but if you don't provide this information, either way, parallel doesn't know what to replace {} with.

I'm not saying this particular use makes sense for your use case, just hopefully clarifying the documentation (again).