Shebang Line Limit in Bash and Linux Kernel

Shebang line limit in bash and linux kernel

Limited to 127 chars on 99.9% of systems due to kernel compile time buffer limit.

It's limited in the kernel by BINPRM_BUF_SIZE, set in include/linux/binfmts.h.

max length of shebang line with Perl?

Does the limit apply only to the executable path (which is under 100 chars in my case), or is there some strange magic at work?

The limit applies only to the executable path because there is strange magic at work. The strange magic goes by the name of Perl.

First, the limit of 127 characters is true. (Or maybe 128 or 126, didn't actually count.) Everything past that gets truncated. Second, regardless of whether there are spaces on the shebang line Linux will pass everything after the executable name as just ONE argument. Third, Perl will parse the shebang line and interpret it itself, which is why this was working. But note that the effect of all this is strange and a bit perverse. Suppose the shebang line looked like this:

#!/bin/perl -I/lib-one -I/lib-two

And suppose the truncation happened near the end, say, right before the "w" in "two" (that is, suppose the limit were 32 instead of 128, for ease of reading). Then the effect is as if you invoked perl at the command line as follows:

shell-prompt$ /bin/perl "-I/lib-one -I/lib-t" -I/lib-one -I/lib-two

Which "works" in this case, but in general won't.

How does the #! shebang work?

Why should the shebang line always be the first line?

The shebang must be the first line because it is interpreted by the kernel, which looks at the two bytes at the start of an executable file. If these are #! the rest of the line is interpreted as the executable to run and with the script file available to that program. (Details vary slightly, but that is the picture).

Since the kernel will only look at the first two characters and has no notion of further lines, you must place the hash bang in line 1.

Now what happens if the kernel can't execute a file beginning with #!whatever? The shell, attempting to fork an executable and being informed by the kernel that it can't execute the program, as a last resort attempts to interpret the file contents as a shell script. Since the shell is not perl, you get a bunch of errors, exactly the same as if you attempted to run

 sh temp.pl

/usr/bin/env questions regarding shebang line pecularities

First of all, you should very seldom use $* and you should almost always use "$@" instead. There are a number of questions here on SO which explain the ins and outs of why.

Second - the env command has two main uses. One is to print the current environment; the other is to completely control the environment of a command when it is run. The third use, which you are demonstrating, is to modify the environment, but frankly there's no need for that - the shells are quite capable of handling that for you.

Mode 1:

env

Mode 2:

env -i HOME=$HOME PATH=$PREPENDPATH:$PATH ... command args

This version cancels all inherited environment variables and runs command with precisely the environment set by the ENVVAR=value options.

The third mode - amending the environment - is less important because you can do that fine with regular (civilized) shells. (That means "not C shell" - again, there are other questions on SO with answers that explain that.) For example, you could perfectly well do:

#!/bin/bash
export PATH=${PREPENDPATH:?}:$PATH
exec python "$@"

This insists that $PREPENDPATH is set to a non-empty string in the environment, and then prepends it to $PATH, and exports the new PATH setting. Then, using that new PATH, it executes the python program with the relevant arguments. The exec replaces the shell script with python. Note that this is quite different from:

#!/bin/bash
PATH=${PREPENDPATH:?}:$PATH exec python "$@"

Superficially, this is the same. However, this will execute the python found on the pre-existing PATH, albeit with the new value of PATH in the process's environment. So, in the example, you'd end up executing Python from /usr/bin and not the one from /home/pi/prepend/bin.

In your situation, I would probably not use env and would just use an appropriate variant of the script with the explicit export.

The env command is unusual because it does not recognize the double-dash to separate options from the rest of the command. This is in part because it does not take many options, and in part because it is not clear whether the ENVVAR=value options should come before or after the double dash.

I actually have a series of scripts for running (different versions of) a database server. These scripts really use env (and a bunch of home-grown programs) to control the environment of the server:

#!/bin/ksh
#
# @(#)$Id: boot.black_19.sh,v 1.3 2008/06/25 15:44:44 jleffler Exp $
#
# Boot server black_19 - IDS 11.50.FC1

IXD=/usr/informix/11.50.FC1
IXS=black_19
cd $IXD || exit 1

IXF=$IXD/do.not.start.$IXS
if [ -f $IXF ]
then
    echo "$0: will not start server $IXS because file $IXF exists" 1>&2
    exit 1
fi

ONINIT=$IXD/bin/oninit.$IXS
if [ ! -f $ONINIT ]
then ONINIT=$IXD/bin/oninit
fi

tmpdir=$IXD/tmp
DAEMONIZE=/work1/jleffler/bin/daemonize
stdout=$tmpdir/$IXS.stdout
stderr=$tmpdir/$IXS.stderr

if [ ! -d $tmpdir ]
then asroot -u informix -g informix -C -- mkdir -p $tmpdir
fi

# Specialized programs carried to extremes:
#   * asroot sets UID and GID values and then executes
#   * env, which sets the environment precisely and then executes
#   * daemonize, which makes the process into a daemon and then executes
#   * oninit, which is what we really wanted to run in the first place!
# NB: daemonize defaults stdin to /dev/null and could set umask but
#     oninit dinks with it all the time so there is no real point.
# NB: daemonize should not be necessary, but oninit doesn't close its
#     controlling terminal and therefore causes cron-jobs that restart
#     it to hang, and interactive shells that started it to hang, and
#     tracing programs.
# ??? Anyone want to integrate truss into this sequence?

asroot -u informix -g informix -C -a dbaao -a dbsso -- \
    env -i HOME=$IXD \
        INFORMIXDIR=$IXD \
        INFORMIXSERVER=$IXS \
        INFORMIXCONCSMCFG=$IXD/etc/concsm.$IXS \
        IFX_LISTEN_TIMEOUT=3 \
        ONCONFIG=onconfig.$IXS \
        PATH=/usr/bin:$IXD/bin \
        SHELL=/usr/bin/ksh \
        TZ=UTC0 \
    $DAEMONIZE -act -d $IXD -o $stdout -e $stderr -- \
    $ONINIT "$@"

case "$*" in
(*v*) track-oninit-v $stdout;;
esac

Difference between shebang flags vs. set builtin flags

are there any other differences in functionality/behaviour?

When your file has executable permissions and is executed, then the shebang line is parsed by the kernel.

When your file is executed under the shell like bash ./script.sh then the shebang is just a comment. So it will be ignored, and your script will be run with whatever the callers flags are. Putting your flags after the shebang will make sure proper flags are set in your scripts in either case.

The shebang is parsed by kernel. That basically means that the behavior differs from kernel to kernel, from operating system to operating system. Some operating systems didn't handle arguments in shebang at all and ignored all the arguments. Some kernels parse for example #!/bin/sh -a -b as execl("/bin/sh", "-a -b") some as execl("/bin/sh", "-a", "-b"). The parsing of the shebang line to executable and arguments is done by some other code different that your shell. Sometimes if there is a space after #! like #! /bin/sh utilities don't recognize it as a valid shebang. There's even a recent linux kernel regression with too long shebang line.

The behavior of how shebang is interpreted differs between systems, so you can't be certain, so it's best to set options after the shebang.

Is behaviour same in a POSIX shell, too?

A POSIX shell doesn't (have to) interpret your shebang. If you are asking if executing sh -e and set -e has the same behavior in posix shell, then yes, the option -e on command line has the same behavior as set -e.

I couldn't find a specification of shebang line nor how should it be interpreted in posix specification. I can see in execve documentation:

Another way that some historical implementations handle shell scripts is by recognizing the first two bytes of the file as the character string "#!" and using the remainder of the first line of the file as the name of the command interpreter to execute.

Those "historical implementations" seem to be very widely used still today.

The shebang line is parsed by the kernel after exec* calls. But when you are doing sh <script> or popen or system the shell can (but doesn't have to) interpret the shebang line by itself as a extension and not rely on kernel implementation, from posix:

Shell Introduction
The shell reads its input from a file (see sh), from the −c option or from the system() and popen() functions defined in the System Interfaces volume of POSIX.1-200x. If the first line of a file of shell commands starts with the characters"#!",the results are unspecified.

As for bash, it looks like bash first tries execve, then if it can't find the reason why kernel failed to run the executable, if the file has a shebang, then it parses shebang on its own to find out the interpreter.