Sort Logs by Date Field in Bash

Sort logs by date field in bash

For GNU sort: sort -k2M -k3n -k4

  • -k2M sorts by second column by month (this way "March" comes before "April")
  • -k3n sorts by third column in numeric mode (so that " 9" comes before "10")
  • -k4 sorts by the fourth column.

See more details in the manual.

How to sort data according to the date in bash?

The relevant field be must rendered suitable for sorting, that is, in the form of YYYY-MM-DD, using a utility such as sed or awk. For example, with GNU sed:

sed -E 's/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3-\2-\1/' employees.txt |
sort -r -t'|' -k5,5 | head -n1 | cut -d'|' -f2

sort by datetime format in bash

Sort is able to deal with month names, thanks to the option M

No need to change , into !. Use the white space as delimiter and just issue:

LC_ALL=en sort -k7nr -k5Mr -k6nr -k2r sample

If you use this as content of the file sample:

ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Apr 1 2021, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Mar 17 2021, foo=moshe2, bar=haim2
ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Mar 16 2021, foo=moshe, bar=haim
ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Feb 28 2021, foo=moshe, bar=haim
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2020, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2021, foo=moshe2, bar=haim2

you will get this as output:

ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Apr 1 2021, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2021, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Mar 17 2021, foo=moshe2, bar=haim2
ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Mar 16 2021, foo=moshe, bar=haim
ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Feb 28 2021, foo=moshe, bar=haim
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2020, foo=moshe2, bar=haim2

Specifying -k7 means to sort on the seventh field. The r option reverses the order of sorting to descending. The M option sorts according the name of the month. The n option sorts numerically. To sort on the time, just consider the whole second field (beginning with the string setup_time=) as a fixed length string using -k2.

LC_ALL=en in the begin of the command line tells the system to use the English names of the months.

How to sort data based on date field by excluding header

The way do to this is with Command Grouping where you can extract the header from an input stream, print it, and consume the remaining data:

{
IFS= read -r header
echo "$header"
sort ...
} < file.txt

However, sorting dates with that format is tricky. Here's how you have to do it so the output is sorted chronologically. This assumes GNU sort:

$ cat file.txt          # I added a couple of extra records
NAME|AGE|COURSE|DATES
v1|31|MC|12 JUL 2019
v2|33|MB|4 JUL 2019
v3|12|GG|13 JUL 2019
v4|21|JJ|7 JUL 2019
11|22|33|1 JUL 2020
aa|bb|cc|10 AUG 2019

$ {
IFS= read -r header
echo "$header"
sort -t'|' -n -s -k4 | sort -M -s -k 2,2 | sort -n -s -k 3,3
} < file.txt
NAME|AGE|COURSE|DATES
v2|33|MB|4 JUL 2019
v4|21|JJ|7 JUL 2019
v1|31|MC|12 JUL 2019
v3|12|GG|13 JUL 2019
aa|bb|cc|10 AUG 2019
11|22|33|1 JUL 2020

That uses the GNU sort "stable" option so you sort first by day, then by month, then by year.

Sorting by Date in Shell

The sort you are using will fail for any date before year 2000 (e.g. 1999 will sort after 2098). Continuing from your question in the comment, you currently show

sort -n -t":" -k3.9 -k3.4,3.5 -k3

You should use

sort -n -t":" -k3.7 -k3.4,3.5 -k3.1,3.2

Explanation:

Your -t separates the fields on each colon. (':') The -k KEYDEF where KEYDEF is in the form f[.c][opt] (that's field.character option) (you need no separate option after character). Your date field is (field 3):

  d d / m m / y y y y
1 2 3 4 5 6 7 8 9 0 -- chars counting from 1 in field 3

So you first sort by -k3.9 (the 9th character in field 3) which is the 2-digit year in the 4-digit field. You really want to sort on -k3.7 (which is the start of the 4-digit date)

You next sort by the month (characters 4,5) which is fine.

Lastly, you sort on -k3 (which fails to limit the characters considered). Just as you have limited the sort on the month to chars 4,5, you should limit the sort of the days to characters 1,2.

Putting that together gives you sort -n -t":" -k3.7 -k3.4,3.5 -k3.1,3.2. Hope that answers your question from the comment.

sort logfile by timestamp on linux command line

Use sort's -k flag:

sort -k1 -r freeswitch.log

That will sort the file, in reverse, by the first key (i.e. freeswitch.log:2011-09-08 12:21:07.282236). If the filename is always the same (freeswitch.log), then it should sort by the date.

How to sort a specific date-time column in bash

perl works just fine:

#!/usr/bin/env perl
#
# Sort timestamps

use 5.12.10;
use Time::Piece;

my $fmt='%Y-%m-%d_%H:%M:%S';
my @t;
while( <DATA> ){
if( $_ !~ m@\d{4}-\d{1,2}-\d{1,2}_\d{1,2}:\d{2}:\d{2}@ ){
say STDERR "Invalid format $_";
next
}
chop; # Delete newline
push @t, Time::Piece->strptime($_, $fmt);
}

say $_->strftime($fmt) foreach sort @t;

__DATA__
2022-08-25_22:55:01
2022-08-25_20:23:24
2022-08-25_22:53:07
2022-08-25_21:53:30
2022-08-25_20:23:33
2022-08-25_20:22:14

To put that in a one-liner, you could do:

perl -MTime::Piece -lnE 'eval { push @t, Time::Piece->strptime($_,"'$fmt'") } or say "Invalid input: $_" } { say $_->strftime("'$fmt'") foreach sort @t' input-file


Related Topics



Leave a reply



Submit