Formatting top's batch mode output for plotting/graphing - linux

I've used top in the following manner to verify how much CPU percentage a certain process is taking up:
while true; do printf "`date` : " >> /var/log/besclient_resourcemonitor.txt; top -bn1 | awk '/BESClient/ {print $9}' >> /var/log/besclient_resourcemonitor.txt ; sleep 20; done
This results in the following output, truncated:
Mon Oct 16 13:37:08 CDT 2017 : 0.0
Mon Oct 16 13:37:29 CDT 2017 : 0.0
Mon Oct 16 13:38:10 CDT 2017 : 1.3
Mon Oct 16 13:38:30 CDT 2017 : 0.0
Mon Oct 16 13:38:51 CDT 2017 : 0.0
Mon Oct 16 13:39:11 CDT 2017 : 1.9
I'd like to try and graph this in gnuplot or Excel, but I'm having difficulty getting the data sorted properly to be able to place Date/Time (Columns 1-6) in X and load (at the last column) as Y. I've tried using cut, sed and awk, but must be missing something. Since there isn't a common delimiter, I believe this is what's confusing cut.
How would you go about skinning this cat? BTW, I like cats, so, no cats were harmed in the making of this output.

Not sure how gunplot works, but if you want this data as is (dont want to process further) and just delimiting columns you mentioned is a concern then you can further use awk as follows and you can store output as csv and further import in excel
awk 'BEGIN{FS=" : "; OFS=","}{print $1, $2}'
gives:
Mon Oct 16 13:37:08 CDT 2017,0.0
Mon Oct 16 13:37:29 CDT 2017,0.0
Mon Oct 16 13:38:10 CDT 2017,1.3
Mon Oct 16 13:38:30 CDT 2017,0.0
Mon Oct 16 13:38:51 CDT 2017,0.0
Mon Oct 16 13:39:11 CDT 2017,1.9
FS and OFS are Field separator and Output field separator

Related

I want to find difference between 2 numbers stored in a file using shell script

Below is content of file. I want to find out difference between each line of first field.
0.607401 # Tue Mar 27 04:30:01 IST 2018
0.607401 # Tue Mar 27 04:35:02 IST 2018
0.606325 # Tue Mar 27 04:40:02 IST 2018
0.606223 # Tue Mar 27 04:45:01 IST 2018
0.606167 # Tue Mar 27 04:50:02 IST 2018
0.605716 # Tue Mar 27 04:55:01 IST 2018
0.605716 # Tue Mar 27 05:00:01 IST 2018
0.607064 # Tue Mar 27 05:05:01 IST 2018
output:-
0
-0.001076
-0.000102
.019944
..
..
.001348
CODE:
awk '{s=$0;getline;print s-$0;next}' a.txt
However this does not work as expected...
Could you help me please?
You can use the following awk code:
$ awk 'NR==1{save=$1;next}NR>1{printf "%.6f\n",($1-save);save=$1}' file
0.000000
-0.001076
-0.000102
-0.000056
-0.000451
0.000000
0.001348
and format the output as you want by modifying the printf.
The way you are currently doing will skip some lines!!!

Bash - print all seuqential lines from file and ignore non-sequential ones

I need to extract all sequential lines from a text file based on the sequence in the 4th column. This sequence is the current time, and there is only one entry for each second (so only one line). Sometimes in the file the sequence will break, because something has slowed down the script that creates it and it has skipped a second or two. As in the below example:
Thu Jun 8 14:17:31 CEST 2017 sync:1
Thu Jun 8 14:17:32 CEST 2017 sync:1
Thu Jun 8 14:17:33 CEST 2017 sync:1
Thu Jun 8 14:17:37 CEST 2017 sync:1 <--
Thu Jun 8 14:17:38 CEST 2017 sync:1
Thu Jun 8 14:17:39 CEST 2017 sync:1
Thu Jun 8 14:17:40 CEST 2017 sync:1
I need bash to ignore this line and continue without printing it, but still print everything before and after it. How should I go about that?
If you only care about the seconds field (eg, 14:17:39 -> 15:22:40 is clearly not sequential, but this code will think it is; if your data is sufficiently simple this may be fine):
awk 'NR==1 || $6 == (p + 1)%60 ; {p=$6}' FS=':\| *' input
To check the hour and minute, you could simply convert to seconds from midnight or add logic to compare the hours and minutes. Something like:
awk '{s=$4 * 3600 + $5 * 60 + $6} NR==1 || s == (p + 1)%86400 ; {p=s}' FS=':\| *' input

BASH - conditional sum of columns and rows in csv file

i have CSV file with some database benchmark results here is the example:
Date;dbms;type;description;W;D;S;results;time;id
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;570;265;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;420;215;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;500;365;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;530;255;50
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;870;265;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;620;215;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;700;365;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;530;255;99
i need to process all rows with the same id (value of the last column) and get this:
Date;dbms;type;description;W;D;S;time;results;results/time
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;sum column 8;sum column 9;(sum column 8 / sum column 9)
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;sum column 8;sum column 9;(sum column 8 / sum column 9)
for now i can only do the sum of column 8 with the awk command:
awk -F";" '{print;sum+=$8 }END{print "sum " sum}' ./file.CSV
Edit:
need help with some modification of script iam already using. here are real input data:
Date;dbms;type;description;W;D;time;TotalTransactions;NOTransactions;id
Mon Jun 15 14:53:41 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;10;272270;117508;50
Mon Jun 15 15:03:46 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;10;280080;110063;50
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;10;144170;31815;60
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;10;137570;33910;60
Mon Jun 15 15:24:04 CEST 2015;hsql;in-memory;TPC-C test results;2;1;10;226660;97734;70
Mon Jun 15 15:34:08 CEST 2015;hsql;in-memory;TPC-C test results;2;1;10;210420;95113;70
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;10;288360;119328;80
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;10;270360;124328;80
i need to sum values in time, TotalTransactions and NOTransactions columns and then add a column with value (sum NOTransactions/sum time)
iam using this script:
awk 'BEGIN {FS=OFS=";"}
(NR==1) {$10="results/time"; print $0}
(NR>1 && NF) {sum7[$10]+=$7; sum8[$10]+=$8; sum9[$10]+=$9; other[$10]=$0}
END {for (i in sum8)
{$0=other[i]; $7=sum7[i];$8=sum8[i]; $9=sum9[i]; $10=sprintf("%.0f", sum9[i]/sum7[i]); print}}' ./logsFinal.csv
gives me this output:
;;;;;;;;;results/time
Mon Jun 15 15:03:46 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;20;552350;227571;11379
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;20;281740;65725;3286
Mon Jun 15 15:34:08 CEST 2015;hsql;in-memory;TPC-C test results;2;1;20;437080;192847;9642
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;20;558720;243656;12183
Date;dbms;type;description;W;D;0;0;0;-nan
values looks good (except header row). But i need to get these results without id column (i want delete id column)
So i need to get same values but instead of identify processed rows with same values in id column it must be rows with same values in dbms AND W AND D columns
You can use this awk:
awk 'BEGIN{ FS=OFS=";" }
NR>1 && NF {
s=""
for(i=1; i<=7; i++)
s=s $i OFS;
a[$NF]=s;
sum8[$NF]+=$8
sum9[$NF]+=$9
} END{
for (i in a)
print a[i] sum8[i], sum9[i], (sum9[i]?sum8[i]/sum9[i]:"NaN")
}' file
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;2020;1100;1.83636
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;2720;1100;2.47273
This awk program will print the modified header and modify the output to contain the sums and their division:
awk 'BEGIN {FS=OFS=";"}
(NR==1) {$10="results/time"; print $0}
(NR>1 && NF) {sum8[$10]+=$8; sum9[$10]+=$9; other[$10]=$0}
END {for (i in sum8)
{$0=other[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}'
which gives:
Date;dbms;type;description;W;D;S;results;time;results/time
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;2020;1100;1.83636
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;2720;1100;2.47273
You don't seem to care for the ID in the result, but if you do, just replace $10= with $11=.
Also, if you need to sum things based on values of more than one column, you can create a temporary variable (a in the example below) which is a concatenation of two columns and use it as an index in the arrays, like this:
awk 'BEGIN {FS=OFS=";"}
(NR==1) {$10="results/time"; print $0}
(NR>1 && NF) {a=$5$6; sum8[a]+=$8; sum9[a]+=$9; other[a]=$0}
END {for (i in sum8)
{$0=other[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}'

Filter between version names and version numbers

when I run the script kit_version.sh I get the following output
# ./kit_version.bash
--- USAW Kits ---
RPM Kits Installed Time
------------------------------------ ---------------------------------
APP-IR-LRPS-1.1.0.0-01 Thu 15 Nov 2012 11:10:20 AM IST
APP-V-LRPS-4.3.7.0-01 Mon 15 Oct 2012 04:27:54 PM IST
batter-ic-4.3.0.0-04 Mon 24 Feb 2014 02:10:21 PM IST
CSHRS-Monitoring-5.0.0.0-03 Mon 24 Feb 2014 03:32:43 PM IST
CS-RH-watchdog-conf-5.0.0.0-03 Mon 24 Feb 2014 03:32:42 PM IST
CSe-OSP-Bin-5.0.0.0-01 Mon 24 Feb 2014 03:28:00 PM IST
sca_core_2.5.7.0-7 Sun 29 Mar 2015 02:36:46 PM IDT
sca_data:80.7.0-7 Sun 29 Mar 2015 02:37:04 PM IDT
.
.
.
How to filter the output so I get in the first field only the package name and the second field
only the version number as the following:
./kit_version.bash | ......
APP-IR-LRPS 1.1.0.0-01
APP-V-LRPS 4.3.7.0-01
batter-ic 4.3.0.0-04
CSHRS-Monitoring 5.0.0.0-03
CS-RH-watchdog-conf 5.0.0.0-03
CSe-OSP-Bin 5.0.0.0-01
sca_core 2.5.7.0-7
sca_data 80.7.0-7
Remark – the separator between the version name to version number could be different char
With GNU awk, I can imagine
./kit_version.bash | gawk '{ print gensub(/.([0-9.]+-[0-9.]+)$/, "\t\\1", 1, $1) }'
This will replace the character before a string matching a version number at the end of the first field with a tab and print the result of that substitution. To cut off the first three lines, use
awk 'NR > 3 { print gensub(/.([0-9.]+-[0-9.]+)$/, "\t\\1", 1, $1) }'
that is, add the NR > 3 condition.
Alternatively with sed:
./kit_version.bash | sed '1d;2d;3d;s/[[:space:]].*//;s/.\([0-9.]\+-[0-9.]\+\)$/\t\1/'
That is:
1d # first three lines: delete
2d
3d
s/[[:space:]].*// # remove everything after the first space,
# i.e., everything except the first field
s/.\([0-9.]\+-[0-9.]\+\)$/\t\1/ # then substitute as before.
This depends on no packages ending with a number while also being delimited from the version number by a period. That is to say,
# vvvvvvvv-- if this is supposed to be the version
somepackage2.3.4.5-10
will not work properly (it will give somepackag 2.3.4.5-10). It seems unlikely that this format is allowed, though.
./kit_version.bash \
| sed 's/^[[:space:]]*\([^[:space:]]*\).*/\1/;T clean;s/[-._]\([0-9][0-9._-]*\)$/\t\1/;t;:clean;s/.*//'
reformat the line (remove heading space and trialing info)
if no modif, go to cleaning the line
reformat to separate version from name
Only with GNU sed due to T option (or need a t jump;b clean^J:jump^J on posix version where ^J is a real new line)

unix awk command is not putting comma for the empty values in a csv file

I have two csv files which look like below:
name,Direction,Date
abc,sent,Jan 21 2014 02:06
xyz,sent,Nov 21 2014 01:09
pqr,sent,Oct 21 2014 03:06
and
name,Direction,Date
abc,received,Jan 22 2014 02:06
xyz,received,Nov 22 2014 02:06
I am combining these two files based on the first column and creating a merged file. Two commands that I am using for the required output are:
awk -F, -v OFS="," 'BEGIN{print "name,Direction,Date,currentDirection,receivedDate"} NR==FNR&&NR>1{a[$1]=$0;next} FNR>1{printf "%s%s\n",$0,($1 in a?FS a[$1]:"")}' 2.csv 1.csv
join -1 1 -2 1 -t, -a 1 1.csv 2.csv | sed "s/Direction,Date/currentDirection,receivedDate/2"
Both these command are giving me the below output:
name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06
But, I want , (comma) to be placed at the empty (unmatched data) places and output should be like:
name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06,,
Please notice the commas after the date in the third row. That is needed for my java application to read the new csv file.
Could anyone please suggest what I am missing here?
change the awk one-liner (it looks like my codes...)
awk ...... FNR>1{printf "%s%s\n",$0,($1 in a?FS a[$1]:",,") ....
Actually, change "" into ",,"

Resources