How grep count, with conditions - linux

I have log file with this strings
2022-08-13 19:15:17.170 INFO 550034 --- [ scheduling-3] org.hibernate.SQL_SLOW : SlowQuery: 11387 milliseconds. SQL:
I need grep command to count slowQuery for last hour with time more than 10000 ms
I've try
grep "SQL_SLOW" app.log | wc -l
but i can't add two conditions:
time (first 19 symbols) must be between current time minus one hour
time for query must be more than 10000 ms (in example it 11387 ms)

grep is the wrong tool for the job; trying to get the date and time conditions to match via RE is just wrong.
How about awk?
echo '2022-08-13 19:15:17.170 INFO 550034 --- [ scheduling-3] org.hibernate.SQL_SLOW : SlowQuery: 11387 milliseconds. SQL:' \
| awk -v now=$(date "+%s") '
/SlowQuery:/ {
tmp = "\" " $1 " " $2 "\""
cmd = "date -d " tmp " +%s"
cmd | getline ts
t = gensub(/.*SlowQuery: ([0-9]+) milliseconds.*/, "\\1", "g", $0)
if (now - ts < 3600 && t > 10000) {
print $0
}
}
Brief explanation: first we capture the current time in seconds since epoch.
Then we convert the timestamp for each line of the log containing /SlowQuery:/ to seconds since epoch tmp="\x22 "$1" "$2"\x22";cmd="date -d " tmp " +%s";cmd|getline ts and store it in the variable ts.
We extract the time taken t=gensub(/.*SlowQuery: ([0-9]+) milliseconds.*/, "\\1", "g") and store it in t.
The last step is to check if less than an hour has passed and the query was slower than 10000 ms if(ts-now<3600 && t>10000){print $0}. If both is true, print the line.

Someone helped me find this solution. But it have some admissions. Logs will not be exactly for the last hour, but for the current incomplete hour and for the previous hour (that is, depending on the current time, it will be from 1 to 2 hours)
grep "SQL_SLOW" "app.log" \
| grep "^$(date -d '1 hour ago' '+%Y-%m-%d %H')" \
| grep -P "SlowQuery: \d{5,} milliseconds" \
| wc -l
But solution with awk looks like better

Related

Count occurrences of string in logfile in last 5 minutes in bash

I have log file containing logs like this:
[Oct 13 09:28:15] WARNING.... Today is good day...
[Oct 13 09:28:15] Info... Tommorow will be...
[Oct 13 09:28:15] WARNING.... Yesterday was...
I need shell command to count occurrences of certain string in last 5 minutes.
I have tried this:
$(awk -v d1="$(date --date="-5 min" "+%b %_d %H:%M:%S")" -v d2="$(date "+%b %_d %H:%M:%S")" '$0 > d1 && $0 < d2 || $0 ~ d2' "$1" |
grep -ci "$2")
and calling script like this: sh ${script} /var/log/message "day" but it does not work
Your immediate problem is that you are comparing dates in random string format. To Awk (and your computer generally) a string which starts with "Dec" is "less than" a string which starts with "Oct" (this is what date +%b produces). Generally, you would want both your log files and your programs to use dates in some standard computer-readable format, usually ISO 8601.
Unfortunately, though, sometimes you can't control that, and need to adapt your code accordingly. The solution then is to normalize the dates before comparing them.
awk -v d1=$(date -d "-5 min" +"%F-%T") -v d2=$(date +"%F-%T") '
BEGIN { split("Jan:Feb:Mar:Apr:May:Jun:Jul:Aug:Sep:Oct:Nov:Dec", m, ":")
for (i=1; i<=12; ++i) mon["[" m[i]] = i }
{ timestamp = substr(d1, 1, 5) mon[$1] "-" $2 "-" $3 }
timestamp > d1 && timestamp <= d2' "$1" | grep -ci "$2
This will not work across New Year boundaries, but should hopefully at least help get you started in the right direction. (I suppose you could check if the year in d2 is different, and then check if the month in $1 is January, and then add 1 to the year from d1 in timestamp; but I leave this as an exercise for the desperate. This still won't work across longer periods of time, but the OP calls for a maximum period of 5 minutes, so the log can't straddle multiple years. Or if it does, you have a more fundamental problem.)
Perhaps note as well that date -d is a GNU extension which is not portable to POSIX (so this will not work e.g. on MacOS without modifications).
(Also, for production use, I would refactor the grep -ci into the Awk script; see also useless use of grep.)
Finally, the command substitution $(...) around your entire command line is wrong; this would instruct your shell to use the output from Awk and run it as a command.

Read datetime from a file and add one second

I'm reading a timestamp from a file and I'd like to copy the value to the START_DATE variable, and add 1 second to that value.
export START_DATE=`cat ${WINDOW_END_FILE}`
Timestamp format
2019-04-03-23.59.59
In the end, I'd like the date to be
2019-04-04-00.00.00
Convert the date to epoch time, then add 1:
fmt='%Y-%m-%d-%H.%M.%S'
date -j -f %s $(( $(date -j -f "$fmt" 2019-04-03-23.59.59 +%s) + 1 )) +"$fmt"
Ok here goes, this isn't pretty, but fun:
If I understood, let's just say START_DATE=2019-04-03-23.59.59
# Pull Date section
date=$(echo $START_DATE | cut -d - -f 1-3)
# Pull Time section
time=$(echo $START_DATE | cut -d - -f 4 | sed 's/\./:/g')
# Use bash builtin date function for conversion
date -d "$cal $time + 1 second" +"%Y-%m-%d-%H-%M-%S"
Output:
2019-04-04-00-00-00
Using GNU awk:
$ gawk '{
gsub(/[^0-9]/," ") # change the format mktime friendly
print strftime("%F-%H.%M.%S",mktime($0)+1) # to epoch, add one, reformat back
}' file # a timestamp from a file
2019-04-04-00.00.00 # output this time
mktime turns datespec (YYYY MM DD HH MM SS [DST]) to number of seconds since the system epoch. strftime formats the timestamp to given format.

SED to parse apache logs between timestamp

I am trying to parse a log and get the lines between timestamp.Tried sed approach like below but facing issue with regex
Log pattern:
IP - - [20/Apr/2018:14:25:37 +0000] "GET / HTTP/1.1" 301 3936 "-" "
IP - - [20/Apr/2018:14:44:08 +0000]
----------------------------------
IP- - [20/Apr/2018:20:43:46 +0000]
I need to get the lines between 14:25 and 20:43 for 20th april as the log contains other dates also.
Tried this:
sed -n '/\[14:25/,/\[20:43/p' *-https_access.log.1
but not working.
Since you mentioned you want logs for 20th April, I'd suggest something like :
$ sed -n '/20\/Apr\/2018:14:25/,/20\/Apr\/2018:20:43/p' *-https_access.log.1
This is very less likely to conflict with false matches in case "20:43" occurs elsewhere.
sed is not appropriate because it's hard to compare element (like day and hour).
with awk (self commented):
awk -F '[ []' '
{
# separt date and hour then rebuild the fields
sub(/:/, " ", $5);$0=$0""
}
# print if it s the day and between the 2 hour (string compare works in this case)
$5 ~ /20.Apr.2018/ && $6 >= "04:25" && $7 < "20:44"
' YourFile
more generaly, we can use variable to give date and hour as paramter to the awk (not the purpose here)
To print lines between match1 and match2 with sed or awk you can do:
sed -n '/match1/,/match2/p' inputfile
awk '/match1/,/match2/' inputfile
in your example match1 is 20/Apr/2018:14:25 and match2 is 20/Apr/2018:20:43. So any of these commands should work for you:
sed -n '/20\/Apr\/2018:14:25/,/20\/Apr\/2018:20:43/p' inputfile
awk '/20\/Apr\/2018:14:25/,/20\/Apr\/2018:20:43/' inputfile
or use | as a sed's delimiter to prevent escaping slash:
sed -n '\|20/Apr/2018:14:25|,\|20/Apr/2018:20:43|p' inputfile
The best solution is to use awk for this. What you need to do is convert your time-stamps to a unix-time and then do the comparisons. In awk you can do this using mktime()
mktime(datespec [, utc-flag ]): Turn datespec into a timestamp in the same form as is returned by systime(). It is similar to the
function of the same name in ISO C. The argument, datespec, is a
string of the form YYYY MM DD HH MM SS [DST]. The string consists of
six or seven numbers representing, respectively, the full year
including century, the month from 1 to 12, the day of the month from 1
to 31, the hour of the day from 0 to 23, the minute from 0 to 59, the
second from 0 to 60,55 and an optional daylight-savings flag.
In order to convert your time-format of the form 20/Apr/2018:14:25:37 +0000 into 2018 04 20 14 25 37 +0000
awk -v tstart="20/Apr/2018:14:25:00" -v tend = "20/Apr/2018:20:43:00" \
'function tounix(str) {
split(str,a,"/|:| ")
return mktime(a[3]" "month[a[2]]" "a[1]" "a[4]" "a[5]" "a[6])
}
BEGIN{
month["Jan"]="01";month["Feb"]="02";month["Mar"]="03"
month["Apr"]="04";month["May"]="05";month["Jun"]="06"
month["Jul"]="07";month["Aug"]="08";month["Sep"]="09"
month["Oct"]="10";month["Nov"]="11";month["Dec"]="12"
FS="\\[|\\]"
t1=tounix(tstart)
t2=tounix(tend)
}
{ t=tounix($2) }
(t1<=t && t<=t)' <file>
This method is robust as it will do true time comparisons which are independent of leap years, day/month/year-cross-overs, ... In contrast to other solutions provided, this method also does not require the existence of the date tstart and tend in the file

How can I check the last 5 min overall cpu usage using SAR

I know this example of sar sar -u 1 3 which gives statistics for the next 3 seconds with 1 second interval .
However sar also keeps on collecting the information in background (My cron set to collect stats for every minute ) . Is there any way I can simply query using sar command to tell the last 5 mins statistics and its average .
Right now I am using following below command
interval=5; sar -f /var/log/sysstat/sa22 | tail -n $interval | head -n -1 | awk '{print $4+$6}'| awk '{s+=$1} END {print s/$interval}'
to check the overall cpu usage in last 5 min .
Is there a better way ?
Unfortunately when using the -f option in sar together with interval and count it doesn't return the average value for the given interval (as you would expect). Instead it always returns the first recorded value in the sar file
The only way to work around that is to use the -s option which allows you to specify a time at which to start your sampling period. I've provided a perl script below that finishes with a call to sar that is constructed in a way that will return what you're looking for.
Hope this helps.
Peter Rhodes.
#!/usr/bin/perl
$interval = 300; # seconds.
$epoch = `date +%s`;
$epoch -= $interval;
$time = `date -d \#$epoch +%H:%M:00`;
$dom = `date +%d`;
chomp($time,$dom);
system("sar -f /var/log/sysstat/sa$dom -B -s $time 300 1");

Process large amount of data using bash

I've got to process a large amount of txt files in a folder using bash scripting.
Each file contains million of row and they are formatted like this:
File #1:
en ample_1 200
it example_3 24
ar example_5 500
fr.b example_4 570
fr.c example_2 39
en.n bample_6 10
File #2:
de example_3 4
uk.n example_5 50
de.n example_4 70
uk example_2 9
en ample_1 79
en.n bample_6 1
...
I've got to filter by "en" or "en.n", finding duplicate occurrences in the second column, sum third colum and get a sorted file like this:
en ample_1 279
en.n bample_6 11
Here my script:
#! /bin/bash
clear
BASEPATH=<base_path>
FILES=<folder_with_files>
TEMP_UNZIPPED="tmp"
FINAL_RES="pg-1"
#iterate each file in folder and apply grep
INDEX=0
DATE=$(date "+DATE: %d/%m/%y - TIME: %H:%M:%S")
echo "$DATE" > log
for i in ${BASEPATH}${FILES}
do
FILENAME="${i%.*}"
if [ $INDEX = 0 ]; then
VAR=$(gunzip $i)
#-e -> multiple condition; -w exact word; -r grep recursively; -h remove file path
FILTER_EN=$(grep -e '^en.n\|^en ' $FILENAME > $FINAL_RES)
INDEX=1
#remove file to free space
rm $FILENAME
else
VAR=$(gunzip $i)
FILTER_EN=$(grep -e '^en.n\|^en ' $FILENAME > $TEMP_UNZIPPED)
cat $TEMP_UNZIPPED >> $FINAL_RES
#AWK BLOCK
#create array a indexed with page title and adding frequency parameter as value.
#eg. a['ciao']=2 -> the second time I find "ciao", I sum previous value 2 with the new. This is why i use "+=" operator
#for each element in array I print i=page_title and array content such as frequency
PARSING=$(awk '{ page_title=$1" "$2;
frequency=$3;
array[page_title]+=frequency
}END{
for (i in array){
print i,array[i] | "sort -k2,2"
}
}' $FINAL_RES)
echo "$PARSING" > $FINAL_RES
#END AWK BLOCK
rm $FILENAME
rm $TEMP_UNZIPPED
fi
done
mv $FINAL_RES $BASEPATH/06/01/
DATE=$(date "+DATE: %d/%m/%y - TIME: %H:%M:%S")
echo "$DATE" >> log
Everything works, but it take a long long time to execute. Does anyone know how to get same result, with less time and less lines of code?
The UNIX shell is an environment from which to manipulate files and processes and sequence calls to tools. The UNIX tool which shell calls to manipulate text is awk so just use it:
$ awk '$1~/^en(\.n)?$/{tot[$1" "$2]+=$3} END{for (key in tot) print key, tot[key]}' file | sort
en ample_1 279
en.n bample_6 11
Your script has too many issues to comment on which indicates you are a beginner at shell programming - get the books Bash Shell Scripting Recipes by Chris Johnson and Effective Awk Programming, 4th Edition, by Arnold Robins.

Resources