Output File Lines from Last to First in Bash

Output file lines from last to first in Bash

I ended up using tail -r, which worked on my OSX (tac doesn't)

tail -r -n10

Print first few and last few lines of file through a pipe with ... in the middle

An awk:

awk -v head=2 -v tail=2 'FNR==NR && FNR<=head
FNR==NR && cnt++==head {print "..."}
NR>FNR && FNR>(cnt-tail)' file file

Or if a single pass is important (and memory allows), you can use perl:

perl -0777 -lanE 'BEGIN{$head=2; $tail=2;}
END{say join("\n", @F[0..$head-1],("..."),@F[-$tail..-1]);}' file

Or, an awk that is one pass:

awk -v head=2 -v tail=2 'FNR<=head
{lines[FNR]=$0}
END{
print "..."
for (i=FNR-tail+1; i<=FNR; i++) print lines[i]
}' file

Or, nothing wrong with being a caveman direct like:

head -2 file; echo "..."; tail -2 file

Any of these prints:

1
2
...
9
10

It terms of efficiency, here are some stats.

For small files (ie, less than 10 MB or so) all these are less than 1 second and the 'caveman' approach is 2 ms.

I then created a 1.1 GB file with seq 99999999 >file

  • The two pass awk: 50 secs
  • One pass perl: 10 seconds
  • One pass awk: 29 seconds
  • 'Caveman': 2 MS

How can I read first n and last n lines from a file?

Chances are you're going to want something like:

... | awk -v OFS='\n' '{a[NR]=$0} END{print a[1], a[2], a[NR-1], a[NR]}'

or if you need to specify a number and taking into account @Wintermute's astute observation that you don't need to buffer the whole file, something like this is what you really want:

... | awk -v n=2 'NR<=n{print;next} {buf[((NR-1)%n)+1]=$0}
END{for (i=1;i<=n;i++) print buf[((NR+i-1)%n)+1]}'

I think the math is correct on that - hopefully you get the idea to use a rotating buffer indexed by the NR modded by the size of the buffer and adjusted to use indices in the range 1-n instead of 0-(n-1).

To help with comprehension of the modulus operator used in the indexing above, here is an example with intermediate print statements to show the logic as it executes:

$ cat file   
1
2
3
4
5
6
7
8

.

$ cat tst.awk                
BEGIN {
print "Populating array by index ((NR-1)%n)+1:"
}
{
buf[((NR-1)%n)+1] = $0

printf "NR=%d, n=%d: ((NR-1 = %d) %%n = %d) +1 = %d -> buf[%d] = %s\n",
NR, n, NR-1, (NR-1)%n, ((NR-1)%n)+1, ((NR-1)%n)+1, buf[((NR-1)%n)+1]

}
END {
print "\nAccessing array by index ((NR+i-1)%n)+1:"
for (i=1;i<=n;i++) {
printf "NR=%d, i=%d, n=%d: (((NR+i = %d) - 1 = %d) %%n = %d) +1 = %d -> buf[%d] = %s\n",
NR, i, n, NR+i, NR+i-1, (NR+i-1)%n, ((NR+i-1)%n)+1, ((NR+i-1)%n)+1, buf[((NR+i-1)%n)+1]
}
}
$
$ awk -v n=3 -f tst.awk file
Populating array by index ((NR-1)%n)+1:
NR=1, n=3: ((NR-1 = 0) %n = 0) +1 = 1 -> buf[1] = 1
NR=2, n=3: ((NR-1 = 1) %n = 1) +1 = 2 -> buf[2] = 2
NR=3, n=3: ((NR-1 = 2) %n = 2) +1 = 3 -> buf[3] = 3
NR=4, n=3: ((NR-1 = 3) %n = 0) +1 = 1 -> buf[1] = 4
NR=5, n=3: ((NR-1 = 4) %n = 1) +1 = 2 -> buf[2] = 5
NR=6, n=3: ((NR-1 = 5) %n = 2) +1 = 3 -> buf[3] = 6
NR=7, n=3: ((NR-1 = 6) %n = 0) +1 = 1 -> buf[1] = 7
NR=8, n=3: ((NR-1 = 7) %n = 1) +1 = 2 -> buf[2] = 8

Accessing array by index ((NR+i-1)%n)+1:
NR=8, i=1, n=3: (((NR+i = 9) - 1 = 8) %n = 2) +1 = 3 -> buf[3] = 6
NR=8, i=2, n=3: (((NR+i = 10) - 1 = 9) %n = 0) +1 = 1 -> buf[1] = 7
NR=8, i=3, n=3: (((NR+i = 11) - 1 = 10) %n = 1) +1 = 2 -> buf[2] = 8

Strange result while perform reading 'first and last line' from cat output with 'head' and 'tail'

Your "question" at the moment is not actually posed as a question, its merely an observation. To explain your observation however. Consider the difference between the output of:

$ seq 10 | (head -1 && tail -1)
1

and

$ seq 1000 | (head -1 && tail -1)
1
1000

What is happening here? Our pipeline is working as follows:

  • send lines (in this case with numbers but its no different to your cat example) to stdout;
  • reading stdout we have:

    • first, a head ... it will print the first line and then end;
    • next, a tail ... it will begin after the head has run and print the last line.

However, by default, head is not reading the file line by line, or even character by character till it finds a line break, instead its reading the file in chunks (a buffered read). That chunk might be 2048 bytes for example.

So our pipeline is really:

  • send lines (in this case with numbers but its no different to your cat example) to stdout;
  • reading stdout we have:

    • first, a head ... it will read the first 2kb from stdin, print the first line and then end;
    • next, a tail ... it will read the remainder of the data after that first 2k because it never sees it.

If your goal is to only generate the output of the first command (your cat) once, then you could use tee, something like this perhaps:

$ seq 10 | tee >(tail -1) | head -2

Also note that on linux, you could alter the buffering of the first command, something like:

$ stdbuf -oL seq 10 | (head -1 && tail -1)

but this won't work if your command fiddles with its streams (see stdbuf)

How to output a text's first line and last line to the terminal as a single command

The problem with your command is that only the 1st command - head -1 - receives the stdin input, because it consumes it on reading, so that the 2nd command - tail -1 - receives no input.

In this particular case, you can use command grouping ({ ...; ...; }):

{ head -1; tail -1; } < text.txt 

Caveats:

  • The above only works with seekable input, meaning either a regular file, a here-string or here-doc.

    • It will not work with pipeline input (cat text.txt | { head -1; tail -1; }) or with input from a process substitutions ({ head -1; tail -1; } < <(cat text.txt)), because such input is not seekable (tail cannot scan from the end backward).
  • Even with seekable input this is not a generic method to send input to multiple commands at once.

    • The above only works because tail reads backwards from the end of the (seekable) input, irrespective of whether all the input has already been read or not.

As a simpler alternative that works generically, here's a sed solution:

sed -n '1p; $p' text.txt
  • -n suppresses output of lines by default
  • 1p matches line 1 and prints it (p).
  • $p matches the last line ($) and prints it.

How to find only the first and last line of a file using sed

This will work:

sed -n '1p ; $p' error_log

1p will print the first line and $p will print the last line.

As a suggestion, take a look at info sed, not only man sed. You can find the some examples about your question at the paragraph 2.1.

How to get the first line of a file in a bash script?

head takes the first lines from a file, and the -n parameter can be used to specify how many lines should be extracted:

line=$(head -n 1 filename)


Related Topics



Leave a reply



Submit