Count occurrences of string in logfile in last 5 minutes in bash

Count occurrences of string in logfile in last 5 minutes in bash - linux

I have log file containing logs like this:
[Oct 13 09:28:15] WARNING.... Today is good day...
[Oct 13 09:28:15] Info... Tommorow will be...
[Oct 13 09:28:15] WARNING.... Yesterday was...
I need shell command to count occurrences of certain string in last 5 minutes.
I have tried this:
$(awk -v d1="$(date --date="-5 min" "+%b %_d %H:%M:%S")" -v d2="$(date "+%b %_d %H:%M:%S")" '$0 > d1 && $0 < d2 || $0 ~ d2' "$1" |
grep -ci "$2")
and calling script like this: sh ${script} /var/log/message "day" but it does not work

Your immediate problem is that you are comparing dates in random string format. To Awk (and your computer generally) a string which starts with "Dec" is "less than" a string which starts with "Oct" (this is what date +%b produces). Generally, you would want both your log files and your programs to use dates in some standard computer-readable format, usually ISO 8601.
Unfortunately, though, sometimes you can't control that, and need to adapt your code accordingly. The solution then is to normalize the dates before comparing them.
awk -v d1=$(date -d "-5 min" +"%F-%T") -v d2=$(date +"%F-%T") '
BEGIN { split("Jan:Feb:Mar:Apr:May:Jun:Jul:Aug:Sep:Oct:Nov:Dec", m, ":")
for (i=1; i<=12; ++i) mon["[" m[i]] = i }
{ timestamp = substr(d1, 1, 5) mon[$1] "-" $2 "-" $3 }
timestamp > d1 && timestamp <= d2' "$1" | grep -ci "$2
This will not work across New Year boundaries, but should hopefully at least help get you started in the right direction. (I suppose you could check if the year in d2 is different, and then check if the month in $1 is January, and then add 1 to the year from d1 in timestamp; but I leave this as an exercise for the desperate. This still won't work across longer periods of time, but the OP calls for a maximum period of 5 minutes, so the log can't straddle multiple years. Or if it does, you have a more fundamental problem.)
Perhaps note as well that date -d is a GNU extension which is not portable to POSIX (so this will not work e.g. on MacOS without modifications).
(Also, for production use, I would refactor the grep -ci into the Awk script; see also useless use of grep.)
Finally, the command substitution $(...) around your entire command line is wrong; this would instruct your shell to use the output from Awk and run it as a command.

Related

How to filter values in a date range when the exact timestamps are not entered in the log

I want to takes value counts in a given date range from a log file. My log file is looks like this.
values.log
2022-01-01-10:01 AAA-passed
2022-01-01-11:05 AAA-passed
2022-01-01-12:01 AAA-passed
2022-01-01-13:05 AAA-passed
2022-01-02-12:01 AAA-failed
2022-01-03-13:05 AAA-failed
I have tried the following method to take the value counts in the given time range.
t1='2022-01-01-10:01'
t2='2022-01-03-13:05'
pass=$(awk '/^'$t1.*'/,/'$t2.*'/' values.log | grep -w "AAA-passed" | wc -l)
echo $pass
This method works only if exact timestamps have been entered in the log file. But if we give a time range which are not entered in the log file, this method does not work and not gives the answer,
for an example if we give
t1='2022-01-01-10:00'
t2='2022-01-03-14:00'
this not gives any answer, because these exact values for t1 and t2 are not entered in the log file. I tried lot of other methods also but nothing worked for me. Can someone help me to figure out this. Thanks in advance..!
Edit -
I found a relevant answer for this,
awk -v 'start=2018-04-12 14:44:00.000' -v end='2018-04-12 14:45:00.000' '
/^[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2} / {
inrange = $0 >= start && $0 <= end
}
inrange' < your-file
This method work for me, but I dont hard code values for t1 and t2
t1=$(date -d "${dtd} -7 days" +'%Y-%m-%d-%R')
t2=$(date '+%Y-%m-%d-%R')
Result time format - 2022-07-05-12:15
Required time format - 2018-04-12 14:44:00.000
so how can I edit the above expressions to get the date time in required time format.

#Fravadona answered the question you asked so you should accept their answer but this is too long to add as a comment and requires formatting so here it is - FYI in addition to your timestamp comparison, you don't need pipes to grep and wc when you're using awk:
t1='2022-01-01-10:01'
t2='2022-01-03-13:05'
pass=$(
awk -v beg="$t1" -v end="$t2" '
(beg <= $1) && ($1 <= end) && /AAA-passed/ { cnt++ }
END { print cnt+0 }
' values.log
)
echo "$pass"

Your timestamp format is YYYY-mm-dd-HH:MM so you can directly use string comparisons:
t1=2022-01-01-10:01
t2=2022-01-03-13:05
awk -v start="$t1" -v end="$t2" 'start <= $1 && $1 <= end' values.log

Rounding of the millisecond part in Linux datetime

So I have the date format like this : 2019-10-19 23:55:42.797 and I want the millisecond part to be round of into the second so the output should look something like this: 2019-10-19 23:55:43
I have tried
date -d "2019-10-19 23:55:42.797" "+%Y-%m-%d %H:%M:%S"
but it's giving me output like 2019-10-19 23:55:42
How should I do this in Linux bash shell?

This can be done in a single awk like this:
s='2020-12-31 23:59:59.501'
awk -F. 'gsub(/[-:]/, " ", $1) {
dt = mktime($1)
if ($2 >= 500) dt++
print strftime("%F %X", dt)
}' <<< "$s"
2021-01-01 00:00:00

The behaviour you observe is as expected. The format specifiers represent the actual quantity without rounding. Imagine you would include rounding and you have the time "2019-10-19 23:55:42.797" but you are not interested in seconds and set the format to "%F %H:%M", do you want to see "2019-10-19 23:55" or "2019-10-19 23:56", and even further. Imagine you have the time "2020-12-31 23:59:59.501" with format "%F %T", do you want it to show "2021-01-01 00:00:00" or "2020-12-31 23:59:59". While we all want 2020 to finish as soon as possible, the latter still remains the correct time representation.
Rounding in times is only relevant when you look at time differences and not at absolute times. Hence, I strongly recommend not to implement any rounding and just use the output that date provides you.
However, if, for whatever reason you actually need to round the time to the nearest second, then you can do this:
epoch_ms=$(date -d "2019-10-19 23:55:42.797" "+%s%3N")
epoch=$(( (epoch_ms + 500)/1000 ))
date -d "#$epoch" "%F %T"
Or in a single line:
date -d "#$(( ( $(date -d "2019-10-19 23:55:42.797" "+%s%3N") + 500 )/1000 ))" "+%F %T"

SED to parse apache logs between timestamp

I am trying to parse a log and get the lines between timestamp.Tried sed approach like below but facing issue with regex
Log pattern:
IP - - [20/Apr/2018:14:25:37 +0000] "GET / HTTP/1.1" 301 3936 "-" "
IP - - [20/Apr/2018:14:44:08 +0000]
----------------------------------
IP- - [20/Apr/2018:20:43:46 +0000]
I need to get the lines between 14:25 and 20:43 for 20th april as the log contains other dates also.
Tried this:
sed -n '/\[14:25/,/\[20:43/p' *-https_access.log.1
but not working.

Since you mentioned you want logs for 20th April, I'd suggest something like :
$ sed -n '/20\/Apr\/2018:14:25/,/20\/Apr\/2018:20:43/p' *-https_access.log.1
This is very less likely to conflict with false matches in case "20:43" occurs elsewhere.

sed is not appropriate because it's hard to compare element (like day and hour).
with awk (self commented):
awk -F '[ []' '
{
# separt date and hour then rebuild the fields
sub(/:/, " ", $5);$0=$0""
}
# print if it s the day and between the 2 hour (string compare works in this case)
$5 ~ /20.Apr.2018/ && $6 >= "04:25" && $7 < "20:44"
' YourFile
more generaly, we can use variable to give date and hour as paramter to the awk (not the purpose here)

To print lines between match1 and match2 with sed or awk you can do:
sed -n '/match1/,/match2/p' inputfile
awk '/match1/,/match2/' inputfile
in your example match1 is 20/Apr/2018:14:25 and match2 is 20/Apr/2018:20:43. So any of these commands should work for you:
sed -n '/20\/Apr\/2018:14:25/,/20\/Apr\/2018:20:43/p' inputfile
awk '/20\/Apr\/2018:14:25/,/20\/Apr\/2018:20:43/' inputfile
or use | as a sed's delimiter to prevent escaping slash:
sed -n '\|20/Apr/2018:14:25|,\|20/Apr/2018:20:43|p' inputfile

The best solution is to use awk for this. What you need to do is convert your time-stamps to a unix-time and then do the comparisons. In awk you can do this using mktime()
mktime(datespec [, utc-flag ]): Turn datespec into a timestamp in the same form as is returned by systime(). It is similar to the
function of the same name in ISO C. The argument, datespec, is a
string of the form YYYY MM DD HH MM SS [DST]. The string consists of
six or seven numbers representing, respectively, the full year
including century, the month from 1 to 12, the day of the month from 1
to 31, the hour of the day from 0 to 23, the minute from 0 to 59, the
second from 0 to 60,55 and an optional daylight-savings flag.
In order to convert your time-format of the form 20/Apr/2018:14:25:37 +0000 into 2018 04 20 14 25 37 +0000
awk -v tstart="20/Apr/2018:14:25:00" -v tend = "20/Apr/2018:20:43:00" \
'function tounix(str) {
split(str,a,"/|:| ")
return mktime(a[3]" "month[a[2]]" "a[1]" "a[4]" "a[5]" "a[6])
}
BEGIN{
month["Jan"]="01";month["Feb"]="02";month["Mar"]="03"
month["Apr"]="04";month["May"]="05";month["Jun"]="06"
month["Jul"]="07";month["Aug"]="08";month["Sep"]="09"
month["Oct"]="10";month["Nov"]="11";month["Dec"]="12"
FS="\\[|\\]"
t1=tounix(tstart)
t2=tounix(tend)
}
{ t=tounix($2) }
(t1<=t && t<=t)' <file>
This method is robust as it will do true time comparisons which are independent of leap years, day/month/year-cross-overs, ... In contrast to other solutions provided, this method also does not require the existence of the date tstart and tend in the file

How to GREP a string between two date ranges? [duplicate]

I am trying to grep all the lines between 2 date ranges , where the dates are formatted like this :
date_time.strftime("%Y%m%d%H%M")
so say between [201211150821 - 201211150824]
I am trying to write a script which involves looking for lines between these dates:
cat <somepattern>*.log | **grep [201211150821 - 201211150824]**
I am trying to find out if something exists in unix where I can look for a range in date.
I can convert dates in logs to (since epoch) and then use regular grep with [time1 - time2] , but that means reading each line , extracting the time value and then converting it etc .
May be something simple already exist , so that I can specify date/timestamp ranges the way I can provide a numeric range to grep ?
Thanks!
P.S:
Also I can pass in the pattern something like 2012111511(27|28|29|[3-5][0-9]) , but thats specific to ranges I want and its tedious to try out for different dates each time and gets trickier doing it at runtime.

Use awk. Assuming the first token in the line is the timestamp:
awk '
BEGIN { first=ARGV[1]; last=ARGV[2]; }
$1 > first && $1 < last { print; }
' 201211150821 201211150824

A Perl solution:
perl -wne 'print if m/(?<!\d)(20\d{8})(?!\d)/
&& $1 >= 201211150821 && $1 <= 201211150824'
(It finds the first ten-digit integer that starts with 20, and prints the line if that integer is within your range of interest. If it doesn't find any such integer, it skips the line. You can tweak the regex to be more restrictive about valid months and hours and so on.)

You are looking for the somewhat obscure 'csplit' (context split) command:
csplit '%201211150821%' '/201211150824/' file
will split out all the lines between the first and second regexps from file. It is likely to be the fastest and shortest if your files are sorted on the dates (you said you were grepping logs).

Bash + coreutils' expr only:
export cmp=201211150823 ; cat file.txt|while read line; do range=$(expr match "$line" '.*\[\(.*\)\].*'); [ "x$range" = "x" ] && continue; start=${range:0:12}; end=${range:15:12}; [ $start -le $cmp -a $end -ge $cmp ] && echo "match: $line"; done
cmp is your comparison value,

I wrote a specific tool for similar searches - http://code.google.com/p/bsearch/
In your example, the usage will be:
$ bsearch -p '$[YYYYMMDDhhmm]' -t 201211150821 -t 201211150824 logfile.

grep using variable and regex

I am trying to grep a log file for entries within the last 24 hours. I came up with the following command:
grep "$(date +%F\ '%k')"\|"$(date +%F --date='yesterday')\ [$(date +%k)-23]" /path/to/log/file
I know regular expressions can be used in grep, but am not very familiar with regex. You see I am greping for anything from today or anything from yesterday between the current hour or higher. This isnt working and I am guessing due to the way I am trying to pass a command as a variable in the regex of grep. I also wouldnt be opposed to using awk with awk I came up with the following but it is not checking the variables properly:
t=$(date +%F) | y=$(date +%F --date='yesterday') | hr=$(date +%k) | awk '{ if ($1=$t || $1=$y && $2>=$hr) { print $0 }}' /path/to/log/file
I would assume systime could be used with awk rather than settings variables but i am not familiar with systime at all. Any suggestions with either command would be greatly appreciated! Oh and here's the log formatting:
2012-12-26 16:33:16 SMTP connection from [127.0.0.1]:46864 (TCP/IP connection count = 1)
2012-12-26 16:33:16 SMTP connection from (localhost) [127.0.0.1]:46864 closed by QUIT
2012-12-26 16:38:19 SMTP connection from [127.0.0.1]:48451 (TCP/IP connection count = 1)
2012-12-26 16:38:21 SMTP connection from [127.0.0.1]:48451 closed by QUIT
2012-12-26 16:38:21 SMTP connection from [127.0.0.1]:48860 (TCP/IP connection count = 1)

Here's one way using GNU awk. Run like:
awk -f script.awk file
Contents of script.awk:
BEGIN {
time = systime()
}
{
spec = $1 " " $2
gsub(/[-:]/, " ", spec)
}
time - mktime(spec) < 86400
Alternatively, here's the one-liner:
awk 'BEGIN { t = systime() } { s = $1 " " $2; gsub(/[-:]/, " ", s) } t - mktime(s) < 86400' file
Also, the correct way to pass shell vars to awk is to use the -v flag. I've made a few adjustments to your awk command to show you what I mean, but I recommend against doing this:
awk -v t="$(date +%F)" -v y="$(date +%F --date='yesterday')" -v hr="$(date +%k)" '$1==t || $1==y && $2>=hr' file
Explanation:
So before awk starts processing the file, the BEGIN block is processed first. In this block we create a variable called time / t and this is set using the systime() function. systime() simply returns the current time as the number of seconds since the system epoch. Then, for every line in your log file, awk will create another variable called spec / s and this is set to the first and second fields seperated by a single space. Additionally, other characters like - and : need to be globally substituted with spaces for the mktime() function to work correctly and this done using gsub(). Then it's just a little mathematics to test if the datetime in the log file is within the last 24 hours (or exactly 86400 seconds). If the test is true, the line will be printed. Maybe a little extra reading would help, see Time Functions and String Manipulation Functions. HTH.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Count occurrences of string in logfile in last 5 minutes in bash - linux

Related

How to filter values in a date range when the exact timestamps are not entered in the log

Rounding of the millisecond part in Linux datetime

SED to parse apache logs between timestamp

How to GREP a string between two date ranges? [duplicate]

grep using variable and regex

Categories

Resources