Bash: Head & Tail behavior with bash script
This is a fairly interesting issue! Thanks for posting it!
I assumed that this happens as head
exits after processing the first few lines, so SIGPIPE
signal is sent to the bash running the script when it tries to echo $x
next time. I used RedX's script to prove this theory:
#!/usr/bin/bash
rm x.log
for((x=0;x<5;++x)); do
echo $x
echo $x>>x.log
done
This works, as You described! Using t.sh|head -n 2
it writes only 2 lines to the screen and to x.log. But trapping SIGPIPE this behavior changes...
#!/usr/bin/bash
trap "echo SIGPIPE>&2" PIPE
rm x.log
for((x=0;x<5;++x)); do
echo $x
echo $x>>x.log
done
Output:
$ ./t.sh |head -n 2
0
1
./t.sh: line 5: echo: write error: Broken pipe
SIGPIPE
./t.sh: line 5: echo: write error: Broken pipe
SIGPIPE
./t.sh: line 5: echo: write error: Broken pipe
SIGPIPE
The write error occurs as stdout
is already closed as the other end of the pipe is closed. And any attempt to write to the closed pipe causes a SIGPIPE signal, which terminates the program by default (see man 7 signal
). The x.log now contains 5 lines.
This also explains why /bin/echo
solved the problem. See the following script:
rm x.log
for((x=0;x<5;++x)); do
/bin/echo $x
echo "Ret: $?">&2
echo $x>>x.log
done
Output:
$ ./t.sh |head -n 2
0
Ret: 0
1
Ret: 0
Ret: 141
Ret: 141
Ret: 141
Decimal 141 = hex 8D. Hex 80 means a signal was received, hex 0D is for SIGPIPE. So when /bin/echo
tried to write to stdout it got a SIGPIPE and it was terminated (as default behavior) instead of the bash running the script.
unix - head AND tail of file
You can simply:
(head; tail) < file.txt
And if you need to uses pipes for some reason then like this:
cat file.txt | (head; tail)
Note: will print duplicated lines if number of lines in file.txt is smaller than default lines of head + default lines of tail.
How can I use 'head' in a bash script with a variable?
A quick test here seems to indicate that the problem is that your $LINE
variable has trailing spaces (i.e. '5 '
instead of '5'
).
Try removing them.
$ head '-5g' file
head: invalid trailing option -- g
Try `head --help' for more information.
$ head '-5.' file
head: invalid trailing option -- .
Try `head --help' for more information.
$ head '-5 ' file
head: invalid trailing option --
Try `head --help' for more information.
How does (head; tail) file work?
OS X
For OS X, you can look at the source code for head
and the source code for tail
to figure out some of what's going on. In the case of tail
, you'll want to look at forward.c
.
So, it turns out that head
doesn't do anything special. It just reads its input using the stdio
library, so it reads a buffer at a time and might read too much. This means cat file | (head; tail)
won't work for small files where head
's buffering makes it read some (or all) of the last 10 lines.
On the other hand, tail
checks the type of its input file. If it's a regular file, tail
seeks to the end and reads backwards until it finds enough lines to emit. This is why (head; tail) < file
works on any regular file, regardless of size.
Linux
You could look at the source for head
and tail
on Linux too, but it's easier to just use strace
, like this:
(strace -o /tmp/head.trace head; strace -o /tmp/tail.trace tail) < file
Take a look at /tmp/head.trace
. You'll see that the head
command tries to fill a buffer (of 8192 bytes in my test) by reading from standard input (file descriptor 0). Depending on the size of file
, it may or may not fill the buffer. Anyway, let's assume that it reads 10 lines in that first read. Then, it uses lseek
to back up the file descriptor to the end of the 10th line, essentially “unreading” any extra bytes it read. This works because the file descriptor is open on a normal, seekable file. So (head; tail) < file
will work for any seekable file, but it won't make cat file | (head; tail)
work.
On the other hand, tail
does not (in my testing) seek to the end and read backwards, like it does on OS X. At least, it doesn't read all the way back to the beginning of the file.
Here's my test. Create a small, 12-line input file:
yes | head -12 | cat -n > /tmp/file
Then, try (head; tail) < /tmp/file
on Linux. I get this with GNU coreutils 5.97:
1 y
2 y
3 y
4 y
5 y
6 y
7 y
8 y
9 y
10 y
11 y
12 y
But on OS X, I get this:
1 y
2 y
3 y
4 y
5 y
6 y
7 y
8 y
9 y
10 y
3 y
4 y
5 y
6 y
7 y
8 y
9 y
10 y
11 y
12 y
Equivalent of head/tail command to show head/tail or a line
It's equivalent to head
and tail
if you want first/last characters of the whole stream
$ head -c2 <<<"abcdefghijklmnopqrstuvwxyz"
ab<will not output a newline>
$ tail -c3 <<<"abcdefghijklmnopqrstuvwxyz"
yz<newline>
The head
will not output a newline, as it outputs only first two characters. tail
counts newline as a character, so we need to output 3 to get the last two. Reformatting the commands to take arguments as in your example is trivial and I leave that to OP.
You can use cut
if you want first characters of each line:
$ cut -c-2 <<<"abcdefghijklmnopqrstuvwxyz"$'\n''second line'
ab
se
and use rev | cut | rev
mnemonic to get the last characters:
$ rev <<<"abcdefghijklmnopqrstuvwxyz"$'\n''second line' | cut -c-2 | rev
yz
ne
If you want to output more than 10 characters you can't use cut. Y
Different behavior when running ls from within a script
See Why you shouldn't parse the output of ls(1), and rather use process-substitution to process command output.
#!/bin/bash
while IFS= read -r -d '' file; do
echo "$file"
# Do whatever you want to do with your file here
done < <(find someDir/ -maxdepth 1 -mindepth 1 -type f -print0 | sort -z)
The above simple find
lists all files from the required directory (including ones with spaces/special-characters). Here, the output of find
command is fed to stdin
which is parsed by while-loop
.
To ordered sorting of files, add a sort -z
piped to the find
command output.
Strange behavior with bash for loop
It's because you have "
(double quote) around $output
, it is processed as one string
Why doesn't `seq 100 | ( head -n1; tail -n1 )` work on Mac OSX?
I see the same behavior with GNU head
and tail
on Linux.
It depends on how much input head -n1
consumes before it quits. If head
reads all of stdin before it quits, then there is nothing left for tail
to read and tail
produces no output.
Observe:
$ seq 10000 | (head -n1 ; cat ) | head
1
1861
1862
1863
1864
1865
1866
1867
1868
Here, we can see that head -n1
consumes the first 1860 lines. The cat
command sees all the remaining input.
Why is that? Observe how many bytes are in the first 1860 lines:
$ seq 1860 | wc
1860 1860 8193
It's a reasonable guess that head -n1
first reads 8kB of data from stdin, then prints the first line, and, seeing that it needs no more data, it quits. The rest of stdin is available for any subsequent process.
So, with seq 100
which produces less than 8kB output total, head
reads all of stdin and leaves nothing for tail
to read. With seq 10000
which produces more than 8kB, head
will not read all the data in pipeline. The data that it leaves will be available for tail
.
As Charles Duffy points out, the details of this behavior are entirely implementation dependent and, upon any software upgrade, it may change.
Related Topics
How to Increase the /Proc/Pid/Cmdline 4096 Byte Limit
Syntax Error Near Unexpected Token ' - Bash
What Is the Use of File Descriptor 255 in Bash Process
How to Use Kgdb Over Ethernet (Kgdboe)
Use Crontab Job Send Mail, the Email Text Turns to an Attached File Which Named Att00001.Bin
Nasm - Symbol 'Printf' Causes Overflow in R_X86_64_Pc32 Relocation
Sed - Piping a String Before the Last Line in a File
Merging Two Files by a Single Column in Unix
How to Set Java Classpath in Linux
Release of Flock in Case of Errors
Difference Between Number in the Same Column Using Awk
How Is Pthread_Join Implemented
How to Check If Ssh-Agent Is Already Running in Bash
Using Curl in a Bash Script and Getting Curl: (3) Illegal Characters Found in Url
How to Forward Localhost Port on My Container to Localhost on My Host