Filter Log File Entries Based on Date Range

Filter log file entries based on date range

yes, there are multiple ways to do this. Here is how I would go about this. For starters, no need to pipe the output of cat, just open the log file with awk.

awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date {print Date, $0}' access_log

assuming your log looks like mine (they're configurable) than the date is stored in field 4. and is bracketed. What I am doing above is finding everything within the last 2 hours. Note the -d'now-2 hours' or translated literally now minus 2 hours which for me looks something like this: [10/Oct/2011:08:55:23

So what I am doing is storing the formatted value of two hours ago and comparing against field four. The conditional expression should be straight forward.I am then printing the Date, followed by the Output Field Separator (OFS -- or space in this case) followed by the whole line $0. You could use your previous expression and just print $1 (the ip addresses)

awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date {print $1}' | sort  |uniq -c |sort -n | tail

If you wanted to use a range specify two date variables and construct your expression appropriately.

so if you wanted do find something between 2-4hrs ago your expression might looks something like this

awk -vDate=`date -d'now-4 hours' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date && $4 < Date2 {print Date, Date2, $4} access_log'

Here is a question I answered regarding dates in bash you might find helpful.
Print date for the monday of the current week (in bash)

Filter log for last minutes \ hours

In awk:

awk '
BEGIN{ now=systime() } # now in seconds
{
then=$1 " " $2 # then might not be a good var name though :)
gsub(/[-:]/," ",then) # making mktime fit variable out of then
then=mktime(then) # then then to seconds
if(then < now-604800) # compare, 604800 is 7 days in seconds
print # output older than that
}' file
2017-03-18 01:27:12 bla bla
2017-03-18 02:14:11 bla bla
2017-03-20 04:37:14 bla bla

Is awk the fastest way to search for a date/time range in a log file?

The lines are sorted, so you can use the look command. It should be much faster than awk or grep, because it uses a binary search.

How to select date range in awk

The answer is that awk does not have any knowledge of what a date is. Awk knows numbers and strings and can only compare those. So when you want to select dates and times you have to ensure that the date-format you compare is sortable and there are many formats out there:

| type       | example                   | sortable |
|------------+---------------------------+----------|
| ISO-8601 | 2019-11-19T10:05:15 | string |
| RFC-2822 | Tue, 19 Nov 2019 10:05:15 | not |
| RFC-3339 | 2019-11-19 10:05:15 | string |
| Unix epoch | 1574157915 | numeric |
| AM/PM | 2019-11-19 10:05:15 am | not |
| MM/DD/YYYY | 11/19/2019 10:05:15 | not |
| DD/MM/YYYY | 19/11/2019 10:05:15 | not |

So you would have to convert your non-sortable formats into a sortable format, mainly using string manipulations. A template awk program that would achieve what you want is written down here:

# function to convert a string into a sortable format
function convert_date(str) {
return sortable_date
}
# function to extract the date from the record
function extract_date(str) {
return extracted_date
}
# convert the range
(FNR==1) { t1 = convert_date(begin); t2 = convert_date(end) }
# extract the date from the record
{ date_string = extract_date($0) }
# convert the date of the record
{ t = convert_date(date_string) }
# make the selection
(t1 <= t && t < t2) { print }

most of the time, this program can be heavily reduced. If the above is stored in extract_date_range.awk, you could run it as:

$ awk -f extract_date_range.awk begin="date-in-know-format" end="date-in-known-format" logfile

note: the above assumes single-line log-entries. With a minor adaptation, you can process multi-line log-entries.


In the original problem, the following formats were presented:

EEE MMM dd yy HH:mm         # not sortable
EEE MMM dd HH:mm # not sortable
yyyy-MM-dd hh:mm # sortable
dd MMM yyyy HH:mm:ss # not sortable

From the above, all but the second format can be easily converted to a sortable format. The second format misses the Year by which we would have to do an elaborate check making use of the day of the week. This is extremely difficult and never 100% bullet proof.

Excluding the second format, we can write the following functions:

BEGIN {
datefmt1="^[a-Z][a-Z][a-Z] [a-Z][a-Z][a-Z] [0-9][0-9] [0-9][0-9] [0-9][0-9]:[0-9][0-9]"
datefmt3="^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]"
datefmt4="^[0-9][0-9] [a-Z][a-Z][a-Z] [0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]"
}
# convert the range
(FNR==1) { t1 = convert_date(begin); t2 = convert_date(end) }
# extract the date from the record
{ date_string = extract_date($0) }
# skip if date string is empty
(date_string == "") { next }
# convert the date of the record
{ t = convert_date(date_string) }
# make the selection
(t1 <= t && t < t2) { print }

# function to extract the date from the record
function extract_date(str, date_string) {
date_string=""
if (match(datefmt1,str)) { date_string=substr(str,RSTART,RLENGTH) }
else if (match(datefmt3,str)) { date_string=substr(str,RSTART,RLENGTH) }
else if (match(datefmt4,str)) { date_string=substr(str,RSTART,RLENGTH) }
return date_string
}
# function to convert a string into a sortable format
# converts it in the format YYYYMMDDhhmmss
function convert_date(str, a,fmt, YYYY,MM,DD,T, sortable_date) {
sortable_date=""
if (match(datefmt1,str)) {
split(str,a,"[ ]")
YYYY=(a[4] < 70 ? "19" : "20")a[4]
MM=get_month(a[2]); DD=a[3]
T=a[5]; gsub(/[^0-9]/,T)"00"
sortable_date = YYYY MM DD T
}
else if (match(datefmt3,str)) {
sortable_date = str"00"
gsub(/[^0-9]/,sortable_date)
}
else if (match(datefmt4,str)) {
split(str,a,"[ ]")
YYYY=a[3]
MM=get_month(a[2]); DD=a[1]
T=a[4]; gsub(/[^0-9]/,T)"00"
sortable_date = YYYY MM DD T
}
return sortable_date
}
# function to convert Jan->01, Feb->02, Mar->03 ... Dec->12
function get_month(str) {
return sprintf("%02d",(match("JanFebMarAprMayJunJulAugSepOctNovDec",str)+2)/3)
}

ISO 8601 was published on 06/05/88 and most recently amended on 12/01/04.

Get/extract the data from log file of last 3 minutes?

Try the solution below:

awk \
-v start="$(date +"%F %R" --date=@$(expr `date +%s` - 180))" \
-v end="$(date "+%F %R")" \
'$0 ~ start, $0 ~ end' \
agent.log

In the start variable there is the time stamp 3 minutes (180 seconds) before the current time.

In the end there is the current time.

$0 ~ start, $0 ~ end selects the lines between start and end



Related Topics



Leave a reply



Submit