Adding time into a .plot file without adding a new line using awk - linux

I am writing a shell script that runs the command mpstat and iostat to get CPU and disk usage, extract information from those and put them into a .plot file to later graph them using bargraph.pl. What I am having troubles on is when I go use awk to get the time from mpstat like this
mpstat | awk 'FNR == 4 {print $1;}' >> CPU_usage.plot
It will prints a new line at the end of the code. I tried using printf as this is working for my other lines of codes to get the specific information needed without adding a new line of code, but I don't know how I can format it. Is there any way to do this with awk or any other method that I can use to accomplish this? Thanks in advance.
When use the command mpstat this is what bash returns
Linux 3.4.0+ (DESKTOP-JM295S0) 04/30/2017 _x86_64_ (4 CPU)
03:56:43 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
03:56:43 PM all 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
This is what I'm trying to accomplish, take the time, usr, sys, and idle and put them into a file called CPU_usage.plot. This is what I wanted to put into the file:
03:56:43 0.00 0.00 100.00
What I got instead is:
03:56:43
0.00 0.00 100.00
This is my code:
mpstat | awk 'FNR == 4 {print $1;}' >> CPU_usage.plot
mpstat | awk 'FNR == 4 {printf " %f", $4;}' >> CPU_usage.plot
mpstat | awk 'FNR == 4 {printf " %f", $6;}' >> CPU_usage.plot
mpstat | awk 'FNR == 4 {printf " %f\n", $13;}' >> CPU_usage.plot

Use the following awk approach:
mpstat | awk 'NR==4{print $1,$4,$6,$13}' OFS="\t" >> CPU_usage.plot
Now, CPU_usage.plot file should contain:
03:56:43 0.00 0.00 100.00

Related

compare 2nd column of two or more files and print union of all files

I have four tab separated files 1.txt, 2.txt, 3.txt, 4.txt. Each having following format
89 ABI1 0.19
93 ABL1 0.15
94 ABL2 0.07
170 ACSL3 0.21
I want to compare 2nd column of all files and print union (based on 2nd column) into new file, like following:
1.txt 2.txt 3.txt 4.txt
ABL2 0.07 0.01 0.11 0.009
AKT1 0.31 0.05 0.05 0.017
AKT2 0.33 0.05 0.01 0.004
How is it possible in awk?
I tried following but this only compares first columns,
awk 'NR==FNR {h[$1] = $0; next} {print $1,h[$1]}' OFS="\t" 2.txt 1.txt
but when I change it to compare 2nd column it doesn't work
awk 'NR==FNR {h[$2] = $0; next} {print $1,h[$2]}' OFS="\t" 2.txt 1.txt
Also this only works on two files at a time.
Is there any way to do it on four files by comparing 2nd column in awk?
Using join on sorted input files, and assuming a shell that understands process substitutions with <(...) (I've used a copy of the data that you provided for every input file, just adding a line at the top for identification, this is the AAA line):
$ join <( join -1 2 -2 2 -o 0,1.3,2.3 1.txt 2.txt ) \
<( join -1 2 -2 2 -o 0,1.3,2.3 3.txt 4.txt )
AAA 1 2 3 4
ABI1 0.19 0.19 0.19 0.19
ABL1 0.15 0.15 0.15 0.15
ABL2 0.07 0.07 0.07 0.07
ACSL3 0.21 0.21 0.21 0.21
There are three joins here. The first two to be performed are the ones in <(...). The first of these join the first two files, while the second join the last two files. The result of one of these joins looks like
AAA 1 2
ABI1 0.19 0.19
ABL1 0.15 0.15
ABL2 0.07 0.07
ACSL3 0.21 0.21
The option -o 0,1.3,2.3 means "output the join field along with field 3 from both files". -1 2 -2 2 means "use field 2 of each file as join field (rather than field 1)".
The outermost join takes the two results and performs the final join that produces the output.
If the input files are not sorted on the join field:
$ join <( join -1 2 -2 2 -o 0,1.3,2.3 <(sort -k2,2 1.txt) <(sort -k2,2 2.txt) ) \
<( join -1 2 -2 2 -o 0,1.3,2.3 <(sort -k2,2 3.txt) <(sort -k2,2 4.txt) )

Biggest and smallest of all lines

I have a output like this
3.69
0.25
0.80
1.78
3.04
1.99
0.71
0.50
0.94
I want to find the biggest number and the smallest number in the above output
I need output like
smallest is 0.25 and biggest as 3.69
Just sort your input first and print the first and last value. One method:
$ sort file | awk 'NR==1{min=$1}END{print "Smallest",min,"Biggest",$0}'
Smallest 0.25 Biggest 3.69
Hope this help.
OUTPUT="3.69 0.25 0.80 1.78 3.04 1.99 0.71 0.50 0.94"
SORTED=`echo $OUTPUT | tr ' ' '\n' | sort -n`
SMALLEST=`echo "$SORTED" | head -n 1`
BIGGEST=`echo "$SORTED" | tail -n 1`
echo "Smallest is $SMALLEST"
echo "Biggest is $BIGGEST"
Added op's awk oneliner request.
I'm not good at awk, but this works anyway. :)
echo "3.69 0.25 0.80 1.78 3.04 1.99 0.71 0.50 0.94" | awk '{
for (i=1; i<=NF; i++) {
if (length(s) == 0) s = $i;
if (length(b) == 0) b = $i;
if ($i < s) s = $i;
if (b < $i) b = $i;
}
print "Smallest is", s;
print "Biggest is", b;
}'
You want an awk solution?
echo "3.69 0.25 0.80 1.78 3.04 1.99 0.71 0.50 0.94" | \
awk -v RS=' ' '/.+/ { biggest = ((biggest == "") || ($1 > biggest)) ? $1 : biggest;
smallest = ((smallest == "") || ($1 < smallest)) ? $1:smallest}
END { print biggest, smallest}'
Produce the following output:
3.69 0.25
You can use this method also
sort file | echo -e `sed -nr '1{s/(.*)/smallest is :\1/gp};${s/(.*)/biggest no is :\1/gp'}`
TXR solution:
$ txr -e '(let ((nums [mapcar tofloat (gun (get-line))]))
(if nums
(pprinl `smallest is #(find-min nums) and biggest is #(find-max nums)`)
(pprinl "empty input")))'
0.1
-1.0
3.5
2.4
smallest is -1.0 and biggest is 3.5

Which one is more efficient for float operations, awk or bc?

I am writing a system performance script in bash. I want to compute the CPU usage in terms of percent. I have two implementations, one using awk and another one using bc. I would like to know which of the two versions is more efficient. Is it better to use awk or bc for float computations? Thanks!
Version #1 (Using bc)
CPU=$(mpstat 1 1 | grep "Average" | awk '{print $11}')
CPU=$(echo "scale=2;(100-$CPU)" | bc -l)
echo $CPU
Version #2 (Using awk)
CPU=$(mpstat 1 1 | grep "Average" | awk '{idle = $11} {print 100 - idle}')
echo $CPU
Since the processing time of both is going to be tiny, the version that spawns the least amount of processes and subshells is going to be "more efficient".
That's your second example.
But you can make it even simpler by eliminating the grep:
CPU=$(mpstat 1 1 | awk '/Average/{print 100 - $11}')
In version 1, why do you need 2nd line? Why can't you do it from 1st line itself? I am asking because, 1st version is grep+awk+bc; 2nd example is grep+awk. So the comparison is not valid, I think.
For using only bc, without awk, try this:
CPU=$(mpstat 1 1 | grep Average | { read -a P; echo 100 - ${P[10]}; } | bc )
thanks all for educating me on awk/bc!
did the benchmark (in hopefully more proper way):
tl;dr: awk wins
semi-long story:
3 times 1000 runs awk averages to 2.081333s on my system while bc averages to 3.460333s
full story:
[me#thebox tmp]$ time for i in `seq 1 1000` ; do echo "Average: all 5.05 0.00 6.57 0.51 0.00 0.00 0.00 0.00 87.88" | awk '/Average/ {print 100 - $11}' >/dev/null ; done
real 0m1.922s
user 0m0.320s
sys 0m1.308s
[me#thebox tmp]$ time for i in `seq 1 1000` ; do echo "Average: all 5.05 0.00 6.57 0.51 0.00 0.00 0.00 0.00 87.88" | awk '/Average/{print 100 - $11}' >/dev/null ; done
real 0m2.124s
user 0m0.370s
sys 0m1.368s
[me#thebox tmp]$ time for i in `seq 1 1000` ; do echo "Average: all 5.05 0.00 6.57 0.51 0.00 0.00 0.00 0.00 87.88" | awk '/Average/{print 100 - $11}' >/dev/null ; done
real 0m2.198s
user 0m0.412s
sys 0m1.383s
[me#thebox tmp]$ time for i in `seq 1 1000` ; do echo "Average: all 5.05 0.00 6.57 0.51 0.00 0.00 0.00 0.00 87.88" | grep Average | { read -a P; echo 100 - ${P[10]}; } | bc >/dev/null ; done
real 0m3.799s
user 0m0.691s
sys 0m3.059s
[me#thebox tmp]$ time for i in `seq 1 1000` ; do echo "Average: all 5.05 0.00 6.57 0.51 0.00 0.00 0.00 0.00 87.88" | grep Average | { read -a P; echo 100 - ${P[10]}; } | bc >/dev/null ; done
real 0m3.545s
user 0m0.604s
sys 0m2.801s
[me#thebox tmp]$ time for i in `seq 1 1000` ; do echo "Average: all 5.05 0.00 6.57 0.51 0.00 0.00 0.00 0.00 87.88" | grep Average | { read -a P; echo 100 - ${P[10]}; } | bc >/dev/null ; done
real 0m3.037s
user 0m0.602s
sys 0m2.626s
[me#thebox tmp]$
without further tracing I believe this is related to the overhead of forking more processes when using bc.
I did the following benchmark:
#!/bin/bash
count=0
tic="$(date +%s)"
while [ $count -lt 50 ]
do
mpstat 1 1 | awk '/Average/{print 100 - $11}'
count=$(($count+1))
done
toc="$(date +%s)"
sec="$(expr $toc - $tic)"
count=0
tic="$(date +%s)"
while [ $count -lt 50 ]
do
CPU=$(mpstat 1 1 | grep "Average" | awk '{print $11}')
echo "scale=2;(100-$CPU)" | bc -l
count=$(($count+1))
done
toc="$(date +%s)"
sec1="$(expr $toc - $tic)"
echo "Execution Time awk: "$sec
echo "Execution Time bc: "$sec1
Both execution times were the same... 50 seconds. Apparently it does not make any difference.

Script for monitoring disk i/o rates on Linux

I need a for monitoring ALL disk i/o rates on Linux using bash, awk, sed. The problem is that it must return one row per time interval (so this one row should contain: tps, kB_read/s, kB_wrtn/s, kB_read, kB_wrtn, but summarized per all disks).
Natural choice here is of course:
-d -k -p $interval $loops
To limit it to all disks I use:
-d -k -p `parted -l | grep Disk | cut -f1 -d: | cut -f2 -d' '`
Now the nice trick to summarize columns:
-d -k -p `parted -l | grep Disk | cut -f1 -d: | cut -f2 -d' '` > /tmp/jPtafDiskIO.txt
echo `date +"%H:%M:%S"`,`awk 'FNR>2' /tmp/jPtafDiskIO.txt | awk 'BEGIN {FS=OFS=" "}NR == 1 { n1 = $2; n2 = $3; n3 = $4; n4 = $5; n5 = $6; next } { n1 += $2; n2 += $3; n3 += $4; n4 += $5; n5 += $6 } END { print n1","n2","n3","n4","n5 }'` >> diskIO.log
I am almost there, however this (running in the loop) makes being invoked each time from beginning, so I don't get the statistics from interval to interval, but always averages (so each invoke brings me pretty the same output).
I know it sounds complicated, but maybe somebody has an idea? Maybe totally different approach?
Thx.
EDIT:
Sample input (/tmp/jPtafDiskIO.txt):
> Linux 2.6.18-194.el5 (hostname) 08/25/2012
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 0.00 0.00 0.00 35655 59
> sda2 0.00 0.00 0.00 67 272
> sda1 0.00 0.00 0.00 521 274
> sdb 52.53 0.56 569.40 20894989
> 21065384388 sdc 1.90 64.64 10.93
> 2391333384 404432217 sdd 0.00 0.00 0.04
> 17880 1343028
Output diskIO.log:
16:53:12,54.43,65.2,580.37,2412282496,21471160238
Why not use iotop http://guichaz.free.fr/iotop/ ?
dstat might be what you're looking for. It has a lot of things it can report on, with some common ones displayed by default.

Adding increasing numbers infront of a set of results

I have a file and want to add numbers infront of it, below is an example:
I have a file with the following:
0.152
0.153
0.158
0.156
0.157
and I want to put increasing numbers infront of it with a space, like this:
1 0.152
2 0.153
3 0.158
4 0.156
5 0.157
ascendingnumber*space*numberinfile
I would be very greatful if anyone can help. I have a large amount of data so it would take me for ages to add the numbers in manually. Its Linux stuff.
Many Thanks
A struggling student :)!
cat file | awk '{ print NR " " $1 }'
use awk
awk '{print NR " " $0}' input.txt > output.txt
several ways
awk '{print NR,$0}' file
cat -n file
nl file
sed '=' file
ruby -ne 'print "#{$.} #{$_}"' file
Of course, just bash
c=1; while read -r line; do echo $((c++)) $line; done < file
If your system has the nl command:
$ cat numbers.txt
0.152
0.153
0.158
0.156
0.157
$ nl -w 1 -s ' ' numbers.txt
1 0.152
2 0.153
3 0.158
4 0.156
5 0.157
The -w 1 flag specifies the column width of the ascending number. The -s ' ' flag tells nl to use one space to separate the numbers.

Resources