Can I grep for results compared to a timestamp? - linux

I have a tab-delimited text file with three fields: TIMESTAMP, HOST, and STATUS. I need to find if a host was listed as down less than an hour ago. So far, I have this example:
grep "Down" thetextfile.txt | grep "thehostname"
That gives me a little list of all the times that a host was down in the log. Cool. Now I think I just need to get whether the latest TIMESTAMP is less than an hour ago. I am pretty new to Linux and Bash scripting, but in my other work with actual databases, this would be a relatively simple query.
Any ideas? Or is there a much better approach?
Here's an example of the log file:
TIMESTAMP HOST STATUS
Wed Oct 8 12:16:23 EDT 2014 aserver Alive
Wed Oct 8 12:16:23 EDT 2014 anotherserver Down
Thanks!

You can use this BASH script:
#!/bin/bash
# current date-time in seconds (epoch) value
now=$(date '+%s')
while read -r p; do
# ignore 1st row with headers
[[ "$p" == *TIMESTAMP* ]] && continue
# read 3 values in 3 variables t h s
IFS=$'\t' && read t h s <<< "$p"
# convert date string to epoch value
ts=$(date -d "$t" '+%s')
# if date from file is less than 1 hour ago and status is Down then print host name
[[ "$s" == "Down" ]] && (( (now-ts) < 3600 )) && echo "$h"
done < file

I'd use GNU awk:
gawk -v status=Down -v host=anotherserver '
BEGIN {
mo["Jan"]=1; mo["May"]=5; mo["Sep"]=9
mo["Feb"]=2; mo["Jun"]=6; mo["Oct"]=10
mo["Mar"]=3; mo["Jul"]=7; mo["Nov"]=11
mo["Apr"]=4; mo["Aug"]=8; mo["Dec"]=12
}
function elapsed(month, day, time, year) {
gsub(/:/, " ", time)
return systime() - mktime(sprintf("%d %02d %02d %s", year, mo[month], day, time));
}
$NF == status && $(NF-1) == host && elapsed($2,$3,$4,$6) < 3600
' <<DATA
TIMESTAMP HOST STATUS
Wed Oct 8 12:16:23 EDT 2014 aserver Alive
Wed Oct 8 12:16:23 EDT 2014 anotherserver Down
Wed Oct 16 10:16:23 EDT 2014 aserver Alive
Wed Oct 16 10:16:23 EDT 2014 anotherserver Down
Wed Oct 16 10:16:23 EDT 2014 aserver Down
Wed Oct 16 10:16:23 EDT 2014 anotherserver Up
DATA
Wed Oct 16 10:16:23 EDT 2014 anotherserver Down
Current date is Thu Oct 16 10:53:45 EDT 2014

Related

Bash - print all seuqential lines from file and ignore non-sequential ones

I need to extract all sequential lines from a text file based on the sequence in the 4th column. This sequence is the current time, and there is only one entry for each second (so only one line). Sometimes in the file the sequence will break, because something has slowed down the script that creates it and it has skipped a second or two. As in the below example:
Thu Jun 8 14:17:31 CEST 2017 sync:1
Thu Jun 8 14:17:32 CEST 2017 sync:1
Thu Jun 8 14:17:33 CEST 2017 sync:1
Thu Jun 8 14:17:37 CEST 2017 sync:1 <--
Thu Jun 8 14:17:38 CEST 2017 sync:1
Thu Jun 8 14:17:39 CEST 2017 sync:1
Thu Jun 8 14:17:40 CEST 2017 sync:1
I need bash to ignore this line and continue without printing it, but still print everything before and after it. How should I go about that?
If you only care about the seconds field (eg, 14:17:39 -> 15:22:40 is clearly not sequential, but this code will think it is; if your data is sufficiently simple this may be fine):
awk 'NR==1 || $6 == (p + 1)%60 ; {p=$6}' FS=':\| *' input
To check the hour and minute, you could simply convert to seconds from midnight or add logic to compare the hours and minutes. Something like:
awk '{s=$4 * 3600 + $5 * 60 + $6} NR==1 || s == (p + 1)%86400 ; {p=s}' FS=':\| *' input

BASH - conditional sum of columns and rows in csv file

i have CSV file with some database benchmark results here is the example:
Date;dbms;type;description;W;D;S;results;time;id
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;570;265;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;420;215;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;500;365;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;530;255;50
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;870;265;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;620;215;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;700;365;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;530;255;99
i need to process all rows with the same id (value of the last column) and get this:
Date;dbms;type;description;W;D;S;time;results;results/time
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;sum column 8;sum column 9;(sum column 8 / sum column 9)
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;sum column 8;sum column 9;(sum column 8 / sum column 9)
for now i can only do the sum of column 8 with the awk command:
awk -F";" '{print;sum+=$8 }END{print "sum " sum}' ./file.CSV
Edit:
need help with some modification of script iam already using. here are real input data:
Date;dbms;type;description;W;D;time;TotalTransactions;NOTransactions;id
Mon Jun 15 14:53:41 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;10;272270;117508;50
Mon Jun 15 15:03:46 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;10;280080;110063;50
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;10;144170;31815;60
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;10;137570;33910;60
Mon Jun 15 15:24:04 CEST 2015;hsql;in-memory;TPC-C test results;2;1;10;226660;97734;70
Mon Jun 15 15:34:08 CEST 2015;hsql;in-memory;TPC-C test results;2;1;10;210420;95113;70
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;10;288360;119328;80
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;10;270360;124328;80
i need to sum values in time, TotalTransactions and NOTransactions columns and then add a column with value (sum NOTransactions/sum time)
iam using this script:
awk 'BEGIN {FS=OFS=";"}
(NR==1) {$10="results/time"; print $0}
(NR>1 && NF) {sum7[$10]+=$7; sum8[$10]+=$8; sum9[$10]+=$9; other[$10]=$0}
END {for (i in sum8)
{$0=other[i]; $7=sum7[i];$8=sum8[i]; $9=sum9[i]; $10=sprintf("%.0f", sum9[i]/sum7[i]); print}}' ./logsFinal.csv
gives me this output:
;;;;;;;;;results/time
Mon Jun 15 15:03:46 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;20;552350;227571;11379
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;20;281740;65725;3286
Mon Jun 15 15:34:08 CEST 2015;hsql;in-memory;TPC-C test results;2;1;20;437080;192847;9642
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;20;558720;243656;12183
Date;dbms;type;description;W;D;0;0;0;-nan
values looks good (except header row). But i need to get these results without id column (i want delete id column)
So i need to get same values but instead of identify processed rows with same values in id column it must be rows with same values in dbms AND W AND D columns
You can use this awk:
awk 'BEGIN{ FS=OFS=";" }
NR>1 && NF {
s=""
for(i=1; i<=7; i++)
s=s $i OFS;
a[$NF]=s;
sum8[$NF]+=$8
sum9[$NF]+=$9
} END{
for (i in a)
print a[i] sum8[i], sum9[i], (sum9[i]?sum8[i]/sum9[i]:"NaN")
}' file
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;2020;1100;1.83636
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;2720;1100;2.47273
This awk program will print the modified header and modify the output to contain the sums and their division:
awk 'BEGIN {FS=OFS=";"}
(NR==1) {$10="results/time"; print $0}
(NR>1 && NF) {sum8[$10]+=$8; sum9[$10]+=$9; other[$10]=$0}
END {for (i in sum8)
{$0=other[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}'
which gives:
Date;dbms;type;description;W;D;S;results;time;results/time
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;2020;1100;1.83636
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;2720;1100;2.47273
You don't seem to care for the ID in the result, but if you do, just replace $10= with $11=.
Also, if you need to sum things based on values of more than one column, you can create a temporary variable (a in the example below) which is a concatenation of two columns and use it as an index in the arrays, like this:
awk 'BEGIN {FS=OFS=";"}
(NR==1) {$10="results/time"; print $0}
(NR>1 && NF) {a=$5$6; sum8[a]+=$8; sum9[a]+=$9; other[a]=$0}
END {for (i in sum8)
{$0=other[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}'

Get the next time occurance with linux date

Linux date utility can understand a lot of strings including for instance:
$ date -d '8:30'
Fri Jan 2 08:30:00 CET 2015
I'm looking for a way to get the next 8:30, thus:
in case it is Fri Jan 2 before 8:30, the result above should be returned;
otherwise it should print Sat Jan 3 08:30:00 CET 2015.
As one can see next 8:30 doesn't result in the correct answer:
$ date -d 'next 8:30'
date: invalid date ‘next 8:30’
Is there a single expression to calculate this?
Handling it in the shell oneself is of course an option, but makes things more complicates because of daylight save time regulation etc.
In case the clock is adapted to daylight save time, next 8:30 should be parsed to 8:30 according to the settings of the next day.
Testcase:
Given it is Fri Jan 2 12:01:01 CET 2015, the result should be:
$ date -d 'next 8:30'
Sat Jan 3 08:30:00 CET 2015
$ date -d 'next 15:30'
Fri Jan 2 15:30:00 CET 2015
Just use something like:
if [[ $(date -d '8:30 today' +%s) -lt $(date +%s) ]] ; then
next830="$(date -d '8:30 tomorrow')"
else
next830="$(date -d '8:30 today')"
fi
The %s format string gives you seconds since the epoch so the if statement is basically:
if 8:30-today is before now:
use 8:30-tomorrow
else
use 8:30-today
I researched and it does not seem to be possible to do so.
What you can probably do is to compare the hour and minute with 830 and print accordingly:
[ $(date '+%H%M') -le 830 ] && date -d '8:30' || date -d '8:30 + 1 day'
In case you want to work with this easily, create a function to do these calculations.
Test
$ [ $(date '+%H%M') -le 830 ] && date '8:30' || date -d '8:30 + 1 day'
Sat Jan 3 08:30:00 CET 2015

how to print the latest date for every unique values of 1st column in Linux bash

NY2001 May 11 2014
NY2001 May 9 2014
NY2011 Jun 12 2014
NY2019 Jun 19 2014
NY2019 Jun 21 2014
how to print the latest date for every unique values of 1st column in Linux bash.
You can use:
while read p q; do echo "$p "$(date -d "$q" '+%s'); done < file| awk '
!($1 in a) || a[$1]<$2{a[$1]=$2} END{for (i in a) {
printf "%s ", i; system("date \"+%d %b %Y\" -d #" a[i])}}'
NY2019 21 Jun 2014
NY2001 11 May 2014
NY2011 12 Jun 2014

How to add timestamp while redirecting stdout to file in Bash?

I have a program (server) and I am looking for a way (script) that will redirect (or better duplicate) all its stdout to file and add timestamp for each entry.
I've done some research and the furthest I could get was thanks to How to add timestamp to STDERR redirection. It redirects stdout but the timestamp added is of the time when the script finishes:
#!/bin/bash
./server | ./predate.sh > log.txt
code of predate.sh:
#!/bin/bash
while read line ; do
echo "$(date): ${line}"
done
It seems that server output is flushed after exit of the program.(without redirecting it works fine). Also if I try using predate.sh on given example in mentioned thread, it works perfectly. I am aware it would be easy adding a timestamp to the main program but I would rather avoid editing its code.
I recently needed exactly that: receive log messages in a serial console (picocom), print them to a terminal and to a file AND prepend the date.
What I now use looks s.th. like this:
picocom -b 115200 /dev/tty.usbserial-1a122C | awk '{ print strftime("%s: "), $0; fflush(); }' | tee serial.txt
the output of picocom is piped to awk
awk prepends the date (the %s option converts the time to the Number of seconds since 1970-01-01 00:00:00 UTC - or use %c for a human-readable format)
fflush() flushes any buffered output in awk
that is piped to tee which diverts it to a file. (you can find some stuff about tee here)
moreutils ts
Absolute date and time is the default:
$ sudo apt-get install moreutils
$ (echo a;sleep 1;echo b;sleep 3;echo c;sleep 2;echo d;sleep 1) | ts | tee myfile
$ cat myfile
Apr 13 03:10:44 a
Apr 13 03:10:45 b
Apr 13 03:10:48 c
Apr 13 03:10:50 d
or counting from program start with ts -s:
$ (echo a; sleep 1; echo b; sleep 3; echo c; sleep 2; echo d; sleep 1) | ts -s
00:00:00 a
00:00:01 b
00:00:04 c
00:00:06 d
or deltas for benchmarking with ts -i:
$ (echo a; sleep 1; echo b; sleep 3; echo c; sleep 2; echo d; sleep 1) | ts -i
00:00:00 a
00:00:01 b
00:00:03 c
00:00:02 d
$ (echo a; sleep 1; echo b; sleep 3; echo c; sleep 2; echo d; sleep 1) | ts -i '%.s'
0.000010 a
0.983308 b
3.001129 c
2.001120 d
See also: How to monitor for how much time each line of stdout was the last output line in Bash for benchmarking?
Tested on Ubuntu 18.04, moreutils 0.60.
For Me Your Code is working perfectly fine
Check this is what I tried
test.sh
#!/bin/bash
while true; do
echo "hello"
done
predate.sh
#!/bin/bash
while read line; do
echo $(date) ":" $line;
done
then
./test.sh | ./predate.sh
gives me
Tue Jan 14 17:49:47 IST 2014 : hello
Tue Jan 14 17:49:47 IST 2014 : hello
Tue Jan 14 17:49:47 IST 2014 : hello
Tue Jan 14 17:49:47 IST 2014 : hello
Tue Jan 14 17:49:47 IST 2014 : hello
Tue Jan 14 17:49:47 IST 2014 : hello
Tue Jan 14 17:49:47 IST 2014 : hello
Tue Jan 14 17:49:47 IST 2014 : hello
Tue Jan 14 17:49:47 IST 2014 : hello
Tue Jan 14 17:49:47 IST 2014 : hello
Tue Jan 14 17:49:47 IST 2014 : hello
Tue Jan 14 17:49:47 IST 2014 : hello
Tue Jan 14 17:49:47 IST 2014 : hello
Tue Jan 14 17:49:47 IST 2014 : hello
Tue Jan 14 17:49:47 IST 2014 : hello
This can be redirected to some file using ">" or ">>" for append
Again, using ts from moreutils, you can just use exec at the top of your script.
#!/bin/bash
exec > >(ts>>file.log)
echo hello 1
echo hello 2
sleep 5
echo hello 3
If I understand your problem is to have stderr output included in your log.txt file. Right ?
If that's what you want the solution is:
./server 2>&1 | ./predate.sh > log.txt
Regards

Resources