Filter log file entries based on date range
yes, there are multiple ways to do this. Here is how I would go about this. For starters, no need to pipe the output of cat, just open the log file with awk
.
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date {print Date, $0}' access_log
assuming your log looks like mine (they're configurable) than the date is stored in field 4. and is bracketed. What I am doing above is finding everything within the last 2 hours. Note the -d'now-2 hours'
or translated literally now minus 2 hours which for me looks something like this: [10/Oct/2011:08:55:23
So what I am doing is storing the formatted value of two hours ago and comparing against field four. The conditional expression should be straight forward.I am then printing the Date, followed by the Output Field Separator (OFS -- or space in this case) followed by the whole line $0. You could use your previous expression and just print $1 (the ip addresses)
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date {print $1}' | sort |uniq -c |sort -n | tail
If you wanted to use a range specify two date variables and construct your expression appropriately.
so if you wanted do find something between 2-4hrs ago your expression might looks something like this
awk -vDate=`date -d'now-4 hours' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date && $4 < Date2 {print Date, Date2, $4} access_log'
Here is a question I answered regarding dates in bash you might find helpful.
Print date for the monday of the current week (in bash)
Filter log for last minutes \ hours
In awk:
awk '
BEGIN{ now=systime() } # now in seconds
{
then=$1 " " $2 # then might not be a good var name though :)
gsub(/[-:]/," ",then) # making mktime fit variable out of then
then=mktime(then) # then then to seconds
if(then < now-604800) # compare, 604800 is 7 days in seconds
print # output older than that
}' file
2017-03-18 01:27:12 bla bla
2017-03-18 02:14:11 bla bla
2017-03-20 04:37:14 bla bla
Is awk the fastest way to search for a date/time range in a log file?
The lines are sorted, so you can use the look command. It should be much faster than awk
or grep
, because it uses a binary search.
How to select date range in awk
The answer is that awk does not have any knowledge of what a date is. Awk knows numbers and strings and can only compare those. So when you want to select dates and times you have to ensure that the date-format you compare is sortable and there are many formats out there:
| type | example | sortable |
|------------+---------------------------+----------|
| ISO-8601 | 2019-11-19T10:05:15 | string |
| RFC-2822 | Tue, 19 Nov 2019 10:05:15 | not |
| RFC-3339 | 2019-11-19 10:05:15 | string |
| Unix epoch | 1574157915 | numeric |
| AM/PM | 2019-11-19 10:05:15 am | not |
| MM/DD/YYYY | 11/19/2019 10:05:15 | not |
| DD/MM/YYYY | 19/11/2019 10:05:15 | not |
So you would have to convert your non-sortable formats into a sortable format, mainly using string manipulations. A template awk program that would achieve what you want is written down here:
# function to convert a string into a sortable format
function convert_date(str) {
return sortable_date
}
# function to extract the date from the record
function extract_date(str) {
return extracted_date
}
# convert the range
(FNR==1) { t1 = convert_date(begin); t2 = convert_date(end) }
# extract the date from the record
{ date_string = extract_date($0) }
# convert the date of the record
{ t = convert_date(date_string) }
# make the selection
(t1 <= t && t < t2) { print }
most of the time, this program can be heavily reduced. If the above is stored in extract_date_range.awk
, you could run it as:
$ awk -f extract_date_range.awk begin="date-in-know-format" end="date-in-known-format" logfile
note: the above assumes single-line log-entries. With a minor adaptation, you can process multi-line log-entries.
In the original problem, the following formats were presented:
EEE MMM dd yy HH:mm # not sortable
EEE MMM dd HH:mm # not sortable
yyyy-MM-dd hh:mm # sortable
dd MMM yyyy HH:mm:ss # not sortable
From the above, all but the second format can be easily converted to a sortable format. The second format misses the Year by which we would have to do an elaborate check making use of the day of the week. This is extremely difficult and never 100% bullet proof.
Excluding the second format, we can write the following functions:
BEGIN {
datefmt1="^[a-Z][a-Z][a-Z] [a-Z][a-Z][a-Z] [0-9][0-9] [0-9][0-9] [0-9][0-9]:[0-9][0-9]"
datefmt3="^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]"
datefmt4="^[0-9][0-9] [a-Z][a-Z][a-Z] [0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]"
}
# convert the range
(FNR==1) { t1 = convert_date(begin); t2 = convert_date(end) }
# extract the date from the record
{ date_string = extract_date($0) }
# skip if date string is empty
(date_string == "") { next }
# convert the date of the record
{ t = convert_date(date_string) }
# make the selection
(t1 <= t && t < t2) { print }
# function to extract the date from the record
function extract_date(str, date_string) {
date_string=""
if (match(datefmt1,str)) { date_string=substr(str,RSTART,RLENGTH) }
else if (match(datefmt3,str)) { date_string=substr(str,RSTART,RLENGTH) }
else if (match(datefmt4,str)) { date_string=substr(str,RSTART,RLENGTH) }
return date_string
}
# function to convert a string into a sortable format
# converts it in the format YYYYMMDDhhmmss
function convert_date(str, a,fmt, YYYY,MM,DD,T, sortable_date) {
sortable_date=""
if (match(datefmt1,str)) {
split(str,a,"[ ]")
YYYY=(a[4] < 70 ? "19" : "20")a[4]
MM=get_month(a[2]); DD=a[3]
T=a[5]; gsub(/[^0-9]/,T)"00"
sortable_date = YYYY MM DD T
}
else if (match(datefmt3,str)) {
sortable_date = str"00"
gsub(/[^0-9]/,sortable_date)
}
else if (match(datefmt4,str)) {
split(str,a,"[ ]")
YYYY=a[3]
MM=get_month(a[2]); DD=a[1]
T=a[4]; gsub(/[^0-9]/,T)"00"
sortable_date = YYYY MM DD T
}
return sortable_date
}
# function to convert Jan->01, Feb->02, Mar->03 ... Dec->12
function get_month(str) {
return sprintf("%02d",(match("JanFebMarAprMayJunJulAugSepOctNovDec",str)+2)/3)
}
Get/extract the data from log file of last 3 minutes?
Try the solution below:
awk \
-v start="$(date +"%F %R" --date=@$(expr `date +%s` - 180))" \
-v end="$(date "+%F %R")" \
'$0 ~ start, $0 ~ end' \
agent.log
In the start
variable there is the time stamp 3 minutes (180 seconds) before the current time.
In the end
there is the current time.
$0 ~ start, $0 ~ end
selects the lines between start
and end
Related Topics
How to Symlink a File in Linux
Environment Variable Substitution in Sed
What Happens If There Is No Exit System Call in an Assembly Program
Asynchronous Io Io_Submit Latency in Ubuntu Linux
How to Redirect the Output of the Time Command to a File in Linux
Controlling a Usb Power Supply (On/Off) With Linux
Contiguous Physical Memory from Userspace
How Is Stack Memory Allocated When Using 'Push' or 'Sub' X86 Instructions
How to Change the Number of Open Files Limit in Linux
Using Printf in Assembly Leads to Empty Output When Piping, But Works on the Terminal
Simulate Delayed and Dropped Packets on Linux
Syntax Error in Shell Script With Process Substitution
How to Change 'Rpath' in an Already Compiled Binary
How to Show All Shared Libraries Used by Executables in Linux