Bash: Head & Tail Behavior with Bash Script

Bash: Head & Tail behavior with bash script

This is a fairly interesting issue! Thanks for posting it!

I assumed that this happens as head exits after processing the first few lines, so SIGPIPE signal is sent to the bash running the script when it tries to echo $x next time. I used RedX's script to prove this theory:

#!/usr/bin/bash
rm x.log
for((x=0;x<5;++x)); do
echo $x
echo $x>>x.log
done

This works, as You described! Using t.sh|head -n 2 it writes only 2 lines to the screen and to x.log. But trapping SIGPIPE this behavior changes...

#!/usr/bin/bash
trap "echo SIGPIPE>&2" PIPE
rm x.log
for((x=0;x<5;++x)); do
echo $x
echo $x>>x.log
done

Output:

$ ./t.sh |head -n 2
0
1
./t.sh: line 5: echo: write error: Broken pipe
SIGPIPE
./t.sh: line 5: echo: write error: Broken pipe
SIGPIPE
./t.sh: line 5: echo: write error: Broken pipe
SIGPIPE

The write error occurs as stdout is already closed as the other end of the pipe is closed. And any attempt to write to the closed pipe causes a SIGPIPE signal, which terminates the program by default (see man 7 signal). The x.log now contains 5 lines.

This also explains why /bin/echo solved the problem. See the following script:

rm x.log
for((x=0;x<5;++x)); do
/bin/echo $x
echo "Ret: $?">&2
echo $x>>x.log
done

Output:

$ ./t.sh |head -n 2
0
Ret: 0
1
Ret: 0
Ret: 141
Ret: 141
Ret: 141

Decimal 141 = hex 8D. Hex 80 means a signal was received, hex 0D is for SIGPIPE. So when /bin/echo tried to write to stdout it got a SIGPIPE and it was terminated (as default behavior) instead of the bash running the script.

unix - head AND tail of file

You can simply:

(head; tail) < file.txt

And if you need to uses pipes for some reason then like this:

cat file.txt | (head; tail)

Note: will print duplicated lines if number of lines in file.txt is smaller than default lines of head + default lines of tail.

How can I use 'head' in a bash script with a variable?

A quick test here seems to indicate that the problem is that your $LINE variable has trailing spaces (i.e. '5 ' instead of '5').
Try removing them.

$ head '-5g' file
head: invalid trailing option -- g
Try `head --help' for more information.

$ head '-5.' file
head: invalid trailing option -- .
Try `head --help' for more information.

$ head '-5 ' file
head: invalid trailing option --
Try `head --help' for more information.

How does (head; tail) file work?


OS X

For OS X, you can look at the source code for head and the source code for tail to figure out some of what's going on. In the case of tail, you'll want to look at forward.c.

So, it turns out that head doesn't do anything special. It just reads its input using the stdio library, so it reads a buffer at a time and might read too much. This means cat file | (head; tail) won't work for small files where head's buffering makes it read some (or all) of the last 10 lines.

On the other hand, tail checks the type of its input file. If it's a regular file, tail seeks to the end and reads backwards until it finds enough lines to emit. This is why (head; tail) < file works on any regular file, regardless of size.

Linux

You could look at the source for head and tail on Linux too, but it's easier to just use strace, like this:

(strace -o /tmp/head.trace head; strace -o /tmp/tail.trace tail) < file

Take a look at /tmp/head.trace. You'll see that the head command tries to fill a buffer (of 8192 bytes in my test) by reading from standard input (file descriptor 0). Depending on the size of file, it may or may not fill the buffer. Anyway, let's assume that it reads 10 lines in that first read. Then, it uses lseek to back up the file descriptor to the end of the 10th line, essentially “unreading” any extra bytes it read. This works because the file descriptor is open on a normal, seekable file. So (head; tail) < file will work for any seekable file, but it won't make cat file | (head; tail) work.

On the other hand, tail does not (in my testing) seek to the end and read backwards, like it does on OS X. At least, it doesn't read all the way back to the beginning of the file.

Here's my test. Create a small, 12-line input file:

yes | head -12 | cat -n > /tmp/file

Then, try (head; tail) < /tmp/file on Linux. I get this with GNU coreutils 5.97:

     1  y
2 y
3 y
4 y
5 y
6 y
7 y
8 y
9 y
10 y
11 y
12 y

But on OS X, I get this:

     1  y
2 y
3 y
4 y
5 y
6 y
7 y
8 y
9 y
10 y
3 y
4 y
5 y
6 y
7 y
8 y
9 y
10 y
11 y
12 y

Equivalent of head/tail command to show head/tail or a line

It's equivalent to head and tail if you want first/last characters of the whole stream

$ head -c2 <<<"abcdefghijklmnopqrstuvwxyz"
ab<will not output a newline>

$ tail -c3 <<<"abcdefghijklmnopqrstuvwxyz"
yz<newline>

The head will not output a newline, as it outputs only first two characters. tail counts newline as a character, so we need to output 3 to get the last two. Reformatting the commands to take arguments as in your example is trivial and I leave that to OP.

You can use cut if you want first characters of each line:

$ cut -c-2 <<<"abcdefghijklmnopqrstuvwxyz"$'\n''second line'
ab
se

and use rev | cut | rev mnemonic to get the last characters:

$ rev <<<"abcdefghijklmnopqrstuvwxyz"$'\n''second line' | cut -c-2 | rev
yz
ne

If you want to output more than 10 characters you can't use cut. Y

Different behavior when running ls from within a script

See Why you shouldn't parse the output of ls(1), and rather use process-substitution to process command output.

#!/bin/bash

while IFS= read -r -d '' file; do
echo "$file"
# Do whatever you want to do with your file here
done < <(find someDir/ -maxdepth 1 -mindepth 1 -type f -print0 | sort -z)

The above simple find lists all files from the required directory (including ones with spaces/special-characters). Here, the output of find command is fed to stdin which is parsed by while-loop.

To ordered sorting of files, add a sort -z piped to the find command output.

Strange behavior with bash for loop

It's because you have " (double quote) around $output, it is processed as one string

Why doesn't `seq 100 | ( head -n1; tail -n1 )` work on Mac OSX?

I see the same behavior with GNU head and tail on Linux.

It depends on how much input head -n1 consumes before it quits. If head reads all of stdin before it quits, then there is nothing left for tail to read and tail produces no output.

Observe:

$ seq 10000 | (head -n1 ; cat ) | head
1

1861
1862
1863
1864
1865
1866
1867
1868

Here, we can see that head -n1 consumes the first 1860 lines. The cat command sees all the remaining input.

Why is that? Observe how many bytes are in the first 1860 lines:

$ seq 1860 | wc
1860 1860 8193

It's a reasonable guess that head -n1 first reads 8kB of data from stdin, then prints the first line, and, seeing that it needs no more data, it quits. The rest of stdin is available for any subsequent process.

So, with seq 100 which produces less than 8kB output total, head reads all of stdin and leaves nothing for tail to read. With seq 10000 which produces more than 8kB, head will not read all the data in pipeline. The data that it leaves will be available for tail.

As Charles Duffy points out, the details of this behavior are entirely implementation dependent and, upon any software upgrade, it may change.



Related Topics



Leave a reply



Submit