Need Explanations for Linux Bash Builtin Exec Command Behavior

Need explanations for Linux bash builtin exec command behavior

In this particular case, you have the exec in a pipeline. In order to execute a series of pipeline commands, the shell must initially fork, making a sub-shell. (Specifically it has to create the pipe, then fork, so that everything run "on the left" of the pipe can have its output sent to whatever is "on the right" of the pipe.)

To see that this is in fact what is happening, compare:

{ ls; echo this too; } | cat

with:

{ exec ls; echo this too; } | cat

The former runs ls without leaving the sub-shell, so that this sub-shell is therefore still around to run the echo. The latter runs ls by leaving the sub-shell, which is therefore no longer there to do the echo, and this too is not printed.

(The use of curly-braces { cmd1; cmd2; } normally suppresses the sub-shell fork action that you get with parentheses (cmd1; cmd2), but in the case of a pipe, the fork is "forced", as it were.)

Redirection of the current shell happens only if there is "nothing to run", as it were, after the word exec. Thus, e.g., exec >stdout 4<input 5>>append modifies the current shell, but exec foo >stdout 4<input 5>>append tries to exec command foo. [Note: this is not strictly accurate; see addendum.]

Interestingly, in an interactive shell, after exec foo >output fails because there is no command foo, the shell sticks around, but stdout remains redirected to file output. (You can recover with exec >/dev/tty. In a script, the failure to exec foo terminates the script.)


With a tip of the hat to @Pumbaa80, here's something even more illustrative:

#! /bin/bash
shopt -s execfail
exec ls | cat -E
echo this goes to stdout
echo this goes to stderr 1>&2

(note: cat -E is simplified down from my usual cat -vET, which is my handy go-to for "let me see non-printing characters in a recognizable way"). When this script is run, the output from ls has cat -E applied (on Linux this makes end-of-line visible as a $ sign), but the output sent to stdout and stderr (on the remaining two lines) is not redirected. Change the | cat -E to > out and, after the script runs, observe the contents of file out: the final two echos are not in there.

Now change the ls to foo (or some other command that will not be found) and run the script again. This time the output is:

$ ./demo.sh
./demo.sh: line 3: exec: foo: not found
this goes to stderr

and the file out now has the contents produced by the first echo line.

This makes what exec "really does" as obvious as possible (but no more obvious, as Albert Einstein did not put it :-) ).

Normally, when the shell goes to execute a "simple command" (see the manual page for the precise definition, but this specifically excludes the commands in a "pipeline"), it prepares any I/O redirection operations specified with <, >, and so on by opening the files needed. Then the shell invokes fork (or some equivalent but more-efficient variant like vfork or clone depending on underlying OS, configuration, etc), and, in the child process, rearranges the open file descriptors (using dup2 calls or equivalent) to achieve the desired final arrangements: > out moves the open descriptor to fd 1—stdout—while 6> out moves the open descriptor to fd 6.

If you specify the exec keyword, though, the shell suppresses the fork step. It does all the file opening and file-descriptor-rearranging as usual, but this time, it affects any and all subsequent commands. Finally, having done all the redirections, the shell attempts to execve() (in the system-call sense) the command, if there is one. If there is no command, or if the execve() call fails and the shell is supposed to continue running (is interactive or you have set execfail), the shell soldiers on. If the execve() succeeds, the shell no longer exists, having been replaced by the new command. If execfail is unset and the shell is not interactive, the shell exits.

(There's also the added complication of the command_not_found_handle shell function: bash's exec seems to suppress running it, based on test results. The exec keyword in general makes the shell not look at its own functions, i.e., if you have a shell function f, running f as a simple command runs the shell function, as does (f) which runs it in a sub-shell, but running (exec f) skips over it.)


As for why ls>out1 ls>out2 creates two files (with or without an exec), this is simple enough: the shell opens each redirection, and then uses dup2 to move the file descriptors. If you have two ordinary > redirects, the shell opens both, moves the first one to fd 1 (stdout), then moves the second one to fd 1 (stdout again), closing the first in the process. Finally, it runs ls ls, because that's what's left after removing the >out1 >out2. As long as there is no file named ls, the ls command complains to stderr, and writes nothing to stdout.

I don't understand bash exec

Yes, it sends any further output to the file named logfile. In other words, it redirects standard output (also known as stdout) to the file logfile.

Example

Let's start with this script:

$ cat >script.sh
#!/bin/bash
echo First
exec >>logfile
echo Second

If we run the script, we see output from the first but not the second echo statements:

$ bash script.sh
First

The output from the second echo statement went to the file logfile:

$ cat logfile
Second
$

If we had used exec >logfile, then the logfile would be overwritten each time the script was run. Because we used >> instead of >, however, the output will be appended to logfile. For example, if we run it once again:

$ bash script.sh
First
$ cat logfile
Second
Second

Documentation

This is documented in man bash:

exec [-cl] [-a name] [command [arguments]]
If command
is specified, it replaces the shell. No new process is created. The
arguments become the arguments to command. If the -l option is
supplied, the shell places a dash at the beginning of the zeroth
argument passed to command. This is what login(1) does. The -c
option causes command to be executed with an empty environment. If
-a is supplied, the shell passes name as the zeroth argument to the executed command. If command cannot be executed for some reason, a
non-interactive shell exits, unless the execfail shell option is
enabled. In that case, it returns failure. An interactive shell
returns failure if the file cannot be executed. If command is not
specified, any redirections take effect in the current shell, and the
return status is 0.
If there is a redirection error, the return
status is 1. [Emphasis added.]

In your case, no command argument is specified. So, the exec command performs redirections which, in this case, means any further stdout is sent to file logfile.

find command and -exec

The find command has a -exec option. For example:

find / -type f -exec grep -l "bash" {} \;

Other than the similarity in name, the -exec here has absolutely nothing to do with the shell command exec.

The construct -exec grep -l "bash" {} \; tells find to execute the command grep -l "bash" on any files that it finds. This is unrelated to the shell command exec >>logfile which executes nothing but has the effect of redirecting output.

understanding bash exec 1 &2 command

exec is a built-in Bash function, so it can have special behavior that an external program couldn't have. In particular, it has the special behavior that:

If COMMAND is not specified, any redirections take effect in the current shell.

(That's quoting from the message given by help exec.)

This applies to any sort of redirection; you can also write, for example, any of these:

exec >tmp.txt
exec >>stdout.log 2>>stderr.log
exec 2>&1

(It does not, however, apply to pipes.)

What is the purpose of the : (colon) GNU Bash builtin?

Historically, Bourne shells didn't have true and false as built-in commands. true was instead simply aliased to :, and false to something like let 0.

: is slightly better than true for portability to ancient Bourne-derived shells. As a simple example, consider having neither the ! pipeline operator nor the || list operator (as was the case for some ancient Bourne shells). This leaves the else clause of the if statement as the only means for branching based on exit status:

if command; then :; else ...; fi

Since if requires a non-empty then clause and comments don't count as non-empty, : serves as a no-op.

Nowadays (that is: in a modern context) you can usually use either : or true. Both are specified by POSIX, and some find true easier to read. However there is one interesting difference: : is a so-called POSIX special built-in, whereas true is a regular built-in.

  • Special built-ins are required to be built into the shell; Regular built-ins are only "typically" built in, but it isn't strictly guaranteed. There usually shouldn't be a regular program named : with the function of true in PATH of most systems.

  • Probably the most crucial difference is that with special built-ins, any variable set by the built-in - even in the environment during simple command evaluation - persists after the command completes, as demonstrated here using ksh93:

    $ unset x; ( x=hi :; echo "$x" )
    hi
    $ ( x=hi true; echo "$x" )

    $

    Note that Zsh ignores this requirement, as does GNU Bash except when operating in POSIX compatibility mode, but all other major "POSIX sh derived" shells observe this including dash, ksh93, and mksh.

  • Another difference is that regular built-ins must be compatible with exec - demonstrated here using Bash:

    $ ( exec : )
    -bash: exec: :: not found
    $ ( exec true )
    $
  • POSIX also explicitly notes that : may be faster than true, though this is of course an implementation-specific detail.

What does set -e and exec $@ do for docker entrypoint scripts?

It basically takes any command line arguments passed to entrypoint.sh and execs them as a command. The intention is basically "Do everything in this .sh script, then in the same shell run the command the user passes in on the command line".

See:

  • What are the special dollar sign shell variables?
  • Need explanations for Linux bash builtin exec command behavior

sh -c irrationality and programmatically determining and running Linux builtin commands

You're looking for consistency in the wrong place because you're missing a critical aspect of what some of those commands are doing. Running the same command in different contexts might yield a different result. For example, if you run ls or pwd (with no arguments), the result depends on the current directory.

The dichotomy isn't between built-in commands and non-built-in commands, but between commands whose behavior are influenced by which shell runs them and commands that aren't. There is a correlation: most commands that are influenced by which shell runs them are built in, because an external command would not be able to access the state of the shell that runs them.

  • The command alias prints out the lists of aliases defined in the current shell. Aliases are part of the internal state of a shell. If you run a new shell instance, it starts out with no aliases defined, so alias prints an empty list. Typically, when you're running an interactive shell, your aliases are the ones defined by your startup file (e.g. ~/.bashrc) and that's what alias lists. But if you run alias or unalias on the command line, you can change the aliases of that shell instance, and that doesn't affect other shells (try it out to make sure that you understand what's going on).
  • command alias does the same thing as alias since alias is a builtin.
  • builtin alias does the same thing as alias in bash. The builtin command is a bash builtin. builtin does not exist in other shells; on Ubuntu, /bin/sh is not bash but dash, a shell that's smaller, faster and also POSIX-compliant but lacks some of bash's more advanced features. This also explains type builtin. bash -c 'type builtin' would report that builtin is a builtin.
  • type command and type type report that command and type are builtins because they are builtins in sh.

You can't execute a builtin from a program: a builtin is a command of a particular shell. You can execute a shell that supports this builtin and tell it to execute that builtin, but of course the builtin is executed in the context of that shell.

You can no more execute the alias command from a Pascal program than you can call the Pascal write function from a shell program. A shell builtin is a library function of the shell. Shells blur the distinction between their own functions and external programs because you can call an external program using the same syntax rather than going through something like the TProcess class, but at the end of the day the concepts are the same.

A “CLI helper GUI” already exists: it's called a terminal emulator. It sounds like you want to make some more constrained GUI that can only execute certain specific commands. In that case, I don't think it makes sense to expose features such as aliases. You aren't providing an interface to a shell here, you're providing an interface to run program. You aren't interfacing the shell, you're substituting to it. So don't think of shell commands, think of running programs. There's no program called alias.

find -exec' a shell function in Linux

Since only the shell knows how to run shell functions, you have to run a shell to run a function. You also need to mark your function for export with export -f, otherwise the subshell won't inherit them:

export -f dosomething
find . -exec bash -c 'dosomething "$0"' {} \;

Bash: Head & Tail behavior with bash script

This is a fairly interesting issue! Thanks for posting it!

I assumed that this happens as head exits after processing the first few lines, so SIGPIPE signal is sent to the bash running the script when it tries to echo $x next time. I used RedX's script to prove this theory:

#!/usr/bin/bash
rm x.log
for((x=0;x<5;++x)); do
echo $x
echo $x>>x.log
done

This works, as You described! Using t.sh|head -n 2 it writes only 2 lines to the screen and to x.log. But trapping SIGPIPE this behavior changes...

#!/usr/bin/bash
trap "echo SIGPIPE>&2" PIPE
rm x.log
for((x=0;x<5;++x)); do
echo $x
echo $x>>x.log
done

Output:

$ ./t.sh |head -n 2
0
1
./t.sh: line 5: echo: write error: Broken pipe
SIGPIPE
./t.sh: line 5: echo: write error: Broken pipe
SIGPIPE
./t.sh: line 5: echo: write error: Broken pipe
SIGPIPE

The write error occurs as stdout is already closed as the other end of the pipe is closed. And any attempt to write to the closed pipe causes a SIGPIPE signal, which terminates the program by default (see man 7 signal). The x.log now contains 5 lines.

This also explains why /bin/echo solved the problem. See the following script:

rm x.log
for((x=0;x<5;++x)); do
/bin/echo $x
echo "Ret: $?">&2
echo $x>>x.log
done

Output:

$ ./t.sh |head -n 2
0
Ret: 0
1
Ret: 0
Ret: 141
Ret: 141
Ret: 141

Decimal 141 = hex 8D. Hex 80 means a signal was received, hex 0D is for SIGPIPE. So when /bin/echo tried to write to stdout it got a SIGPIPE and it was terminated (as default behavior) instead of the bash running the script.



Related Topics



Leave a reply



Submit