using awk to check between two dates
The key observation is that you can compare your timestamps using alphanumeric comparisons and get the correct answer - that is the beauty of ISO 8601 notation.
Thus, adapting your code slightly - and formatting to avoid scroll bars:
awk 'BEGIN {
FS = "\n"
RS = ""
OFS = ";"
ORS = "\n"
t1 = "2010-03-23T07:45:00"
t2 = "2010-03-23T08:00:00"
m1 = "eventTimestamp: " t1
m2 = "eventTimestamp: " t2
}
$1 ~ /eventTimestamp:/ && $4 ~ /SMS-MO-FSM(-INFO)?$/ {
if ($1 >= m1 && $1 <= m2) print $1, $2, $3, $4;
}' "$@"
Obviously, you could put this into a script file - you wouldn't want to type it often. And getting the date range entered accurately and conveniently is one of the hard parts. Note that I've adjusted the time range to match the data.
When run on the sample data, it outputs one record:
eventTimestamp: 2010-03-23T07:56:19.186;result: Allowed;protocol: SMS;payload: SMS-MO-FSM
How to select date range in awk
The answer is that awk does not have any knowledge of what a date is. Awk knows numbers and strings and can only compare those. So when you want to select dates and times you have to ensure that the date-format you compare is sortable and there are many formats out there:
| type | example | sortable |
|------------+---------------------------+----------|
| ISO-8601 | 2019-11-19T10:05:15 | string |
| RFC-2822 | Tue, 19 Nov 2019 10:05:15 | not |
| RFC-3339 | 2019-11-19 10:05:15 | string |
| Unix epoch | 1574157915 | numeric |
| AM/PM | 2019-11-19 10:05:15 am | not |
| MM/DD/YYYY | 11/19/2019 10:05:15 | not |
| DD/MM/YYYY | 19/11/2019 10:05:15 | not |
So you would have to convert your non-sortable formats into a sortable format, mainly using string manipulations. A template awk program that would achieve what you want is written down here:
# function to convert a string into a sortable format
function convert_date(str) {
return sortable_date
}
# function to extract the date from the record
function extract_date(str) {
return extracted_date
}
# convert the range
(FNR==1) { t1 = convert_date(begin); t2 = convert_date(end) }
# extract the date from the record
{ date_string = extract_date($0) }
# convert the date of the record
{ t = convert_date(date_string) }
# make the selection
(t1 <= t && t < t2) { print }
most of the time, this program can be heavily reduced. If the above is stored in extract_date_range.awk
, you could run it as:
$ awk -f extract_date_range.awk begin="date-in-know-format" end="date-in-known-format" logfile
note: the above assumes single-line log-entries. With a minor adaptation, you can process multi-line log-entries.
In the original problem, the following formats were presented:
EEE MMM dd yy HH:mm # not sortable
EEE MMM dd HH:mm # not sortable
yyyy-MM-dd hh:mm # sortable
dd MMM yyyy HH:mm:ss # not sortable
From the above, all but the second format can be easily converted to a sortable format. The second format misses the Year by which we would have to do an elaborate check making use of the day of the week. This is extremely difficult and never 100% bullet proof.
Excluding the second format, we can write the following functions:
BEGIN {
datefmt1="^[a-Z][a-Z][a-Z] [a-Z][a-Z][a-Z] [0-9][0-9] [0-9][0-9] [0-9][0-9]:[0-9][0-9]"
datefmt3="^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]"
datefmt4="^[0-9][0-9] [a-Z][a-Z][a-Z] [0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]"
}
# convert the range
(FNR==1) { t1 = convert_date(begin); t2 = convert_date(end) }
# extract the date from the record
{ date_string = extract_date($0) }
# skip if date string is empty
(date_string == "") { next }
# convert the date of the record
{ t = convert_date(date_string) }
# make the selection
(t1 <= t && t < t2) { print }
# function to extract the date from the record
function extract_date(str, date_string) {
date_string=""
if (match(datefmt1,str)) { date_string=substr(str,RSTART,RLENGTH) }
else if (match(datefmt3,str)) { date_string=substr(str,RSTART,RLENGTH) }
else if (match(datefmt4,str)) { date_string=substr(str,RSTART,RLENGTH) }
return date_string
}
# function to convert a string into a sortable format
# converts it in the format YYYYMMDDhhmmss
function convert_date(str, a,fmt, YYYY,MM,DD,T, sortable_date) {
sortable_date=""
if (match(datefmt1,str)) {
split(str,a,"[ ]")
YYYY=(a[4] < 70 ? "19" : "20")a[4]
MM=get_month(a[2]); DD=a[3]
T=a[5]; gsub(/[^0-9]/,T)"00"
sortable_date = YYYY MM DD T
}
else if (match(datefmt3,str)) {
sortable_date = str"00"
gsub(/[^0-9]/,sortable_date)
}
else if (match(datefmt4,str)) {
split(str,a,"[ ]")
YYYY=a[3]
MM=get_month(a[2]); DD=a[1]
T=a[4]; gsub(/[^0-9]/,T)"00"
sortable_date = YYYY MM DD T
}
return sortable_date
}
# function to convert Jan->01, Feb->02, Mar->03 ... Dec->12
function get_month(str) {
return sprintf("%02d",(match("JanFebMarAprMayJunJulAugSepOctNovDec",str)+2)/3)
}
Awk between two dates in a logfile - almost working
You should transform date to a format YYYYMMDD so it can be lexicographilly ordered. You can do it with gawk
and regex
, or by doing substrings operations with awk
. Here is the gawk
way
more text_B_14_FEB_03.dt | grep TMYO | gawk 'match($5, "([0-9]+)/([0-9]+)/([0-9]+)", ary) {B
=ary[3] ary[2] ary[1]; if (B < 20140213 && B> 20130104) print }'
awk filter values from array based on date validation and print correct output if there is match with text at START and END including match
awk -v d="$(date --date="7 days ago" "+%Y%m%d")" 'BEGIN{ i=999999 }$6 < d && i >=$3{ if(i>$3){ if (i!=999999) print "END"; print "START" }; print $0; i=$3 }END{ print "END"}' file1
output:
START
A B 25320 FX M.1 20200429
A B 25320 FX M.1 20200421
A B 25320 FX M.1 20200429
A B 25320 FX M.1 20200423
END
START
A B 25276 FX M.1 20200421
A B 25276 FX M.1 20200328
A B 25276 FX M.1 20200328
A B 25276 FX M.1 20200328
A B 25276 FX M.1 20200328
A B 25276 FX M.1 20200328
A B 25276 FX M.1 20200423
A B 25276 FX M.1 20200423
A B 25276 FX M.1 20200423
A B 25276 FX M.1 20200423
A B 25276 FX M.1 20200423
A B 25276 FX M.1 20200423
END
START
A B 25172 FX M.1 20200421
END
START
A B 25060 FX M.1 20200421
END
Filter lines containing date between a range in csv file in shell
In awk:
$ cat program.awk
function mkdt(str) { # functionize dt conversion
split(str, a, "[/ ]") # split dt
return sprintf( "%s-%02d-%02d %s\n" ,a[3], a[2], a[1], a[4]) # zeropad and reorganize
}
mkdt($3) > mkdt(start) && mkdt($3) < mkdt(end) # compare and print
Run it:
$ awk -v start="10/2/2016 23:00" -v end="11/2/2016 20:45" -F, -f program.awk temp.csv
ABHA_BSC,11DPM12-1-7-C1,10/2/2016 23:15,6623893225,42756482355,Juniper_GBE_ABHA_BSC-1-7-C1_JIZAN-1-7-C1_JIZ1AH1-01 | (SOUTHERN_ABHA_ABH0027-MX480-1 TO SOUTHERN_JIZAN_JIZ0005-MX104-1),1GbE
ABHA_BSC,11DPM12-1-7-C1,10/2/2016 23:30,6781639211,44625787536,Juniper_GBE_ABHA_BSC-1-7-C1_JIZAN-1-7-C1_JIZ1AH1-01 | (SOUTHERN_ABHA_ABH0027-MX480-1 TO SOUTHERN_JIZAN_JIZ0005-MX104-1),1GbE
ABHA_BSC,11DPM12-1-7-C1,10/2/2016 23:45,6586403766,41882620412,Juniper_GBE_ABHA_BSC-1-7-C1_JIZAN-1-7-C1_JIZ1AH1-01 | (SOUTHERN_ABHA_ABH0027-MX480-1 TO SOUTHERN_JIZAN_JIZ0005-MX104-1),1GbE
ABHA_BSC,11DPM12-1-7-C11,10/2/2016 23:15,8440733035,54114599426,Juniper_GBE_ABHA_BSC-1-7-C11_JIZAN-1-7-C11_JIZ1AH1-03 | (SOUTHERN_ABHA_ABH0027-MX480-2 TO SOUTHERN_JIZAN_JIZ0005-MX104-2),1GbE
ABHA_BSC,11DPM12-1-7-C11,10/2/2016 23:30,8051347485,49383381691,Juniper_GBE_ABHA_BSC-1-7-C11_JIZAN-1-7-C11_JIZ1AH1-03 | (SOUTHERN_ABHA_ABH0027-MX480-2 TO SOUTHERN_JIZAN_JIZ0005-MX104-2),1GbE
I only zeropad the day and month (1/1/2016
-> 2016-01-01
), not the hours or minutes. There is no sanity checking for missing or distorted datetimes. Add =
to comparisons if needed (ie. >
-> >=
).
How to filter csv file by date column using awk whenever date format constraint does not match date format column?
You can use a regex to match the start of your field, i.e. match the first 10 characters (YYYY-MM-DD) of the field.
today=$(date '+%Y-%m-%d')
awk -v regex="^$today" -F';' '$25 ~ regex' input.csv > today.csv
This passes the value of the $today
variable with -v
to awk
and prepends a ^
to match the start of the field.
Awk to find lines within date range in a file with custom date format
With awk. 0101
is January 1st and 0210
February 10th.
awk -v start="0101" -v stop="0210" \
'BEGIN{m["Jan"]="01"; m["Feb"]="02"; m["Mar"]="03"; m["Apr"]="04"}
{original = $0; $1 = m[$1]; $2 = sprintf("%.2d", $2)}
$1$2 >= start && $1$2 <= stop {print original}' file
Output:
Jan 5 11:34:00 log messages here
Jan 13 16:21:00 log messages here
Feb 1 01:14:00 log messages here
Feb 10 16:32:00 more messages
filter dates within a text file
Using GNU awk for time functions:
$ cat tst.awk
BEGIN {
tgtDays = 10
tgtSecs = tgtDays * 24 * 60 * 60
endTime = strftime("%Y %m %d 12 00 00")
endSecs = mktime(endTime,1)
}
{
mthNr = (index("JanFebMarAprMayJunJulAugSepOctNovDec",$4)+2)/3
begTime = sprintf("%04d %02d %02d 12 00 00", $7, mthNr, $5)
begSecs = mktime(begTime,1)
}
(endSecs - begSecs) < tgtSecs
$ awk -f tst.awk sample.txt
system system_data8 Thu Jul 29 22:36:38 2021
Note that in the above we replace the time of day in both the input data and the current time with noon because when determining how many days between 2 dates by converting a timestamp to seconds since the epoch first then dividing by the number of seconds in a day you have to use the same time each day because otherwise your "number of days" calculation can/will be thrown off by the time each day.
For example look at the following that's trying to determine if 2 dates which ARE 10 days apart are less than 10 days apart:
$ cat diffDatesDemo.awk
BEGIN {
tgtDays = 10
tgtSecs = tgtDays * 24 * 60 * 60
begTime = "2021/08/01 09:00:00"
endTime = "2021/08/11 08:00:00"
begDate = gensub(/([ :][0-9]{2}){3}$/,"",1,begTime)
endDate = gensub(/([ :][0-9]{2}){3}$/,"",1,endTime)
print "Is", begTime, "less than", tgtDays, "days before", endTime "?"
####
print "\nWrong: Compare 2 timestamps including date plus time of day:"
begSecs = mktime(gensub("[/:]"," ","g",begTime),1)
endSecs = mktime(gensub("[/:]"," ","g",endTime),1)
print begDate, "->", endDate, "is", ((endSecs - begSecs) < tgtSecs ? "<" : ">="), tgtDays, "days"
####
####
print "\nRight: Compare 2 dates at the same time each day:"
begSecs = mktime(gensub("[/:]"," ","g",begDate)" 12 00 00",1)
endSecs = mktime(gensub("[/:]"," ","g",endDate)" 12 00 00",1)
print begDate, "->", endDate, "is", ((endSecs - begSecs) < tgtSecs ? "<" : ">="), tgtDays, "days"
####
}
$ awk -f diffDatesDemo.awk
Is 2021/08/01 09:00:00 less than 10 days before 2021/08/11 08:00:00?
Wrong: Compare 2 timestamps including date plus time of day:
2021/08/01 -> 2021/08/11 is < 10 days
Right: Compare 2 dates at the same time each day:
2021/08/01 -> 2021/08/11 is >= 10 days
I also used the UTC flag for mktime()
above to make sure that any local DST changes didn't impact the number of days calculation.
Related Topics
How to Open Remotely Installed Sonar on a Browser
How to Assign a Name for a Screen
How to Check If There Are Symbolic Links Pointing to a Directory
Convert Binary Data to Hexadecimal in a Shell Script
Hadoop: «Error:Java_Home Is Not Set»
How to Recover or Change Oracle Sysdba Password
How to Analyse a Crash Dump File Using Gdb
How to Chmod 0777 a File and Commit as Is to Git on Windows
What Would It Take to Make Windows a Posix Compliant Operating System Out of The Box
How to Confirm Sftp File Delivery
What Are Good Linux/Unix Books for an Advancing User
Where Are All My Inodes Being Used
Linux Pipe Audio File to Microphone Input
How to Enable Bash in Windows 10 Developer Preview
Linux Configure/Make, --Prefix
.Net-Core: Equivalent of Ildasm/Ilasm
Where to Start Learning About Linux Dma/Device Drivers/Memory Allocation