Biggest and smallest of all lines - linux

I have a output like this
3.69
0.25
0.80
1.78
3.04
1.99
0.71
0.50
0.94
I want to find the biggest number and the smallest number in the above output
I need output like
smallest is 0.25 and biggest as 3.69

Just sort your input first and print the first and last value. One method:
$ sort file | awk 'NR==1{min=$1}END{print "Smallest",min,"Biggest",$0}'
Smallest 0.25 Biggest 3.69

Hope this help.
OUTPUT="3.69 0.25 0.80 1.78 3.04 1.99 0.71 0.50 0.94"
SORTED=`echo $OUTPUT | tr ' ' '\n' | sort -n`
SMALLEST=`echo "$SORTED" | head -n 1`
BIGGEST=`echo "$SORTED" | tail -n 1`
echo "Smallest is $SMALLEST"
echo "Biggest is $BIGGEST"
Added op's awk oneliner request.
I'm not good at awk, but this works anyway. :)
echo "3.69 0.25 0.80 1.78 3.04 1.99 0.71 0.50 0.94" | awk '{
for (i=1; i<=NF; i++) {
if (length(s) == 0) s = $i;
if (length(b) == 0) b = $i;
if ($i < s) s = $i;
if (b < $i) b = $i;
}
print "Smallest is", s;
print "Biggest is", b;
}'

You want an awk solution?
echo "3.69 0.25 0.80 1.78 3.04 1.99 0.71 0.50 0.94" | \
awk -v RS=' ' '/.+/ { biggest = ((biggest == "") || ($1 > biggest)) ? $1 : biggest;
smallest = ((smallest == "") || ($1 < smallest)) ? $1:smallest}
END { print biggest, smallest}'
Produce the following output:
3.69 0.25

You can use this method also
sort file | echo -e `sed -nr '1{s/(.*)/smallest is :\1/gp};${s/(.*)/biggest no is :\1/gp'}`

TXR solution:
$ txr -e '(let ((nums [mapcar tofloat (gun (get-line))]))
(if nums
(pprinl `smallest is #(find-min nums) and biggest is #(find-max nums)`)
(pprinl "empty input")))'
0.1
-1.0
3.5
2.4
smallest is -1.0 and biggest is 3.5

Related

Convert column to matrix format using awk

I have a gridded data file in column format as:
ifile.txt
x y value
20.5 20.5 -4.1
21.5 20.5 -6.2
22.5 20.5 0.0
20.5 21.5 1.2
21.5 21.5 4.3
22.5 21.5 6.0
20.5 22.5 7.0
21.5 22.5 10.4
22.5 22.5 16.7
I would like to convert it to matrix format as:
ofile.txt
20.5 21.5 22.5
20.5 -4.1 1.2 7.0
21.5 -6.2 4.3 10.4
22.5 0.0 6.0 16.7
Where top 20.5 21.5 22.5 indicate y and side values indicate x and the inside values indicate the corresponding grid values.
I found a similar question here Convert a 3 column file to matrix format but the script is not working in my case.
The script is
awk '{ h[$1,$2] = h[$2,$1] = $3 }
END {
for(i=1; i<=$1; i++) {
for(j=1; j<=$2; j++)
printf h[i,j] OFS
printf "\n"
}
}' ifile
The following awk script handles :
any size of matrix
no relation between row and column indices so it keeps track of them separately.
If a certain row column index does not appear, the value will default to zero.
This is done in this way:
awk '
BEGIN{PROCINFO["sorted_in"] = "#ind_num_asc"}
(NR==1){next}
{row[$1]=1;col[$2]=1;val[$1" "$2]=$3}
END { printf "%8s",""; for (j in col) { printf "%8.3f",j }; printf "\n"
for (i in row) {
printf "%8.3f",i; for (j in col) { printf "%8.3f",val[i" "j] }; printf "\n"
}
}' <file>
How does it work:
PROCINFO["sorted_in"] = "#ind_num_asc", states that all arrays are sorted numerically by index.
(NR==1){next} : skip the first line
{row[$1]=1;col[$2]=1;val[$1" "$2]=$3}, process the line by storing the row and column index and accompanying value.
The end statement does all the printing.
This outputs:
20.500 21.500 22.500
20.500 -4.100 1.200 7.000
21.500 -6.200 4.300 10.400
22.500 0.000 6.000 16.700
note: the usage of PROCINFO is a gawk feature.
However, if you make a couple of assumptions, you can do it much shorter:
the file contains all possible entries, no missing values
you do not want the indices of the rows and columns printed out:
the indices are sorted in column-major-order
The you can use the following short versions:
sort -g <file> | awk '($1+0!=$1){next}
($1!=o)&&(NR!=1){printf "\n"}
{printf "%8.3f",$3; o=$1 }'
which outputs
-4.100 1.200 7.000
-6.200 4.300 10.400
0.000 6.000 16.700
or for the transposed:
awk '(NR==1){next}
($2!=o)&&(NR!=2){printf "\n"}
{printf "%8.3f",$3; o=$2 }' <file>
This outputs
-4.100 -6.200 0.000
1.200 4.300 6.000
7.000 10.400 16.700
Adjusted my old GNU awk solution for your current input data:
matrixize.awk script:
#!/bin/awk -f
BEGIN { PROCINFO["sorted_in"]="#ind_num_asc"; OFS="\t" }
NR==1{ next }
{
b[$1]; # accumulating unique indices
($1 != $2)? a[$1][$2] = $3 : a[$2][$1] = $3; # set `diagonal` relation between different indices
}
END {
h = "";
for (i in b) {
h = h OFS i # form header columns
}
print h; # print header column values
for (i in b) {
row = i; # index column
# iterating through the row values (for each intersection point)
for (j in a[i]) {
row = row OFS a[i][j]
}
print row
}
}
Usage:
awk -f matrixize.awk yourfile
The output:
20.5 21.5 22.5
20.5 -4.1 1.2 7.0
21.5 -6.2 4.3 10.4
22.5 0.0 6.0 16.7
Perl solution:
#!/usr/bin/perl -an
$h{ $F[0] }{ $F[1] } = $F[2] unless 1 == $.;
END {
#s = sort { $a <=> $b } keys %h;
print ' ' x 5;
printf '%5.1f' x #s, #s;
print "\n";
for my $u (#s) {
print "$u ";
printf '%5.1f', $h{$u}{$_} for #s;
print "\n";
}
}
-n reads the input line by line
-a splits each line on whitespace into the #F array
See sort, print, printf, and keys.
awk solution:
sort -n ifile.txt | awk 'BEGIN{header="\t"}NR>1{if((NR-1)%3==1){header=header sprintf("%4.1f\t",$1); matrix=matrix sprintf("%4.1f\t",$1)}matrix= matrix sprintf("%4.1f\t",$3); if((NR-1)%3==0 && NR!=10)matrix=matrix "\n"}END{print header; print matrix}';
20.5 21.5 22.5
20.5 -4.1 1.2 7.0
21.5 -6.2 4.3 10.4
22.5 0.0 6.0 16.7
Explanations:
sort -n ifile.txt sort the file numerically
header variable will store all the data necessary to create the header line it is initiated to header="\t" and will be appended with the necessary information thanks to header=header sprintf("%4.1f\t",$1) for lines respecting (NR-1)%3==1)
in the same way you construct the matrix using matrix variable: matrix=matrix sprintf("%4.1f\t",$1) will create the first column and
matrix= matrix sprintf("%4.1f\t",$3) will populate the matrix with the content then if((NR-1)%3==0 &&
NR!=10)matrix=matrix "\n" will add the adequate EOL

Find the average of multiple columns for each distinct variable in column 1

Hi I have a file with 6 columns and I wish to know the average of three of these (columns 2,3,4) and the sum of the last two (columns 5 and 6) for each unique variable in column one.
A1234 0.526 0.123 0.456 0.986 1.123
A1234 0.423 0.256 0.397 0.876 0.999
A1234 0.645 0.321 0.402 0.903 1.101
A1234 0.555 0.155 0.406 0.888 1.009
B5678 0.111 0.345 0.285 0.888 0.789
B5678 0.221 0.215 0.305 0.768 0.987
B5678 0.336 0.289 0.320 0.789 0.921
I have come across code that will get the average for column 2 based on column one but is there anyway I can expand this across columns? Thanks
awk '{a[$1]+=$2; c[$1]++} END{for (i in a) printf "%d%s%.2f\n", i, OFS, a[i]/c[i]}'
I would like the output to be in the following format ;each variable in column one will also have a different number of rows
A1234 0.53725 0.21375 0.41525 3.653 4.232
B5678 0.22233 0.283 0.30333 2.445 2.697
awk '{a[$1]+=$2;b[$1]+=$3;c[$1]+=$4;d[$1]+=$5;e[$1]+=$6;f[$1]++} END{for (i in a) print i,a[i]/f[i],b[i]/f[i],c[i]/f[i],d[i],e[i]}' file
O/p:
B5678 0.222667 0.283 0.303333 2.445 2.697
A1234 0.53725 0.21375 0.41525 3.653 4.232
try following once and let me know if this helps you.
awk '{A[$1]=A[$1]?A[$1]+$5+$6:$5+$6;C[$1]=C[$1]?C[$1]+$2+$3+$4:$2+$3+$4;B[$1]++} END{for(i in A){print "Avg. for " i" =\t",C[i]/(B[i]*3) RS "Count for " i" =\t",A[i]}}' Input_file
EDIT: Adding a non-one liner form of solution too now.
awk '{
A[$1]=A[$1]?A[$1]+$5+$6:$5+$6;
C[$1]=C[$1]?C[$1]+$2+$3+$4:$2+$3+$4;
B[$1]++
}
END{
for(i in A){
print "Avg. for " i" =\t",C[i]/(B[i]*3) RS "Count for " i" =\t",A[i]
}
}
' Input_file
awk solution:
awk '{ a[$1]++; avg[$1]+=$2+$3+$4; sum[$1]+=$5+$6 }
END{ for(i in a) printf "%s%.2f%s%.2f\n",i OFS,avg[i]/(a[i]*3),OFS,sum[i] }' file
The output (the 2nd column - average value, the 3rd column - sum value):
B5678 0.27 5.14
A1234 0.39 7.88
To calculate average of column 2, 3, 4:
awk '{ sum += $2 + $3 + $4 } END { print sum / (NR * 3) }'
To calculate the sum of column 5 and 6 group by column 1:
awk '{ arr[$1] += $5 + $6 } END { for (a in arr) if (a) print a, arr[a] }'
To calculate column 5 and 6 of the last row:
tail file -1 | awk '{sum += $5 + $6} END {print sum}'

awk for comparing, selecting and processing columns

I have a list list.txt
1 10691 0.12 54 + 1 10692 0.13 55 -
2 10720 0.23 -1 + 2 10721 0.13 43 -
3 10832 0.43 123 + 3 10833 0.13 88 -
4 11032 0.22 -1 + 4 11033 0.13 -1 -
5 11248 0.12 45 + 5 11249 0.13 -1 -
6 15214 0.88 33 + 6 15215 0.13 45 -
I wish to extract data from columns 3 ($3) and 8 ($8) using a few rules:
Compare columns 4 ($4) and 9 ($9)
i) If both are negative, output "-1".
ii) If $4 < 0 and $9 > 0, output $3; If $4 > 0 and $9 < 0, output $8.
iii) If both $4 and $ 9 >0, output $3+$8
So I tried something like this:
awk '{a[$4]; b[$9]}
END{
for (x in a) {
for (y in b) {
if (x >0 && y >0) {
print $3+$8
}
else if (x >0 && y <=0) {
print $3;
}
else if (x <= 0 && y >0) {
print $8;
}
else if (x <=0 && y <=0) {
print "-1";
}
}
}
}' list.txt
Somehow this script doesn't give the correct number of lines (should be equal to list.txt) or the right data :(
Using list.txt one should get
0.25
0.13
0.56
-1
0.12
1.01
By using the nested for loops, you are comparing all the values of column 4 with all the values of column 9 instead of comparing the values on corresponding rows.
Working with each line as it is read is probably more what you want:
awk '{
x=$4; y=$9;
if (x >0 && y >0) {
print $3+$8
}
else if (x >0 && y <=0) {
print $3;
}
else if (x <= 0 && y >0) {
print $8;
}
else if (x <=0 && y <=0) {
print "-1";
}
}'
although there is an accepted answer I don't think it's idiomatic awk. You definitely can get rid of if/else blocks and should.
$ awk '{x=$4>0;y=$9>0} x&&y{w=$3+$8} x&&!y{w=$3} !x&&y{w=$8} !x&&!y{w=-1} {print w}' xy
0.25
0.13
0.56
-1
0.12
1.01
better yet
$ awk '{x=$4>0;y=$9>0} x{w=$3} y{w+=$8} !x&&!y{w=-1} {print w}' xy

Script for monitoring disk i/o rates on Linux

I need a for monitoring ALL disk i/o rates on Linux using bash, awk, sed. The problem is that it must return one row per time interval (so this one row should contain: tps, kB_read/s, kB_wrtn/s, kB_read, kB_wrtn, but summarized per all disks).
Natural choice here is of course:
-d -k -p $interval $loops
To limit it to all disks I use:
-d -k -p `parted -l | grep Disk | cut -f1 -d: | cut -f2 -d' '`
Now the nice trick to summarize columns:
-d -k -p `parted -l | grep Disk | cut -f1 -d: | cut -f2 -d' '` > /tmp/jPtafDiskIO.txt
echo `date +"%H:%M:%S"`,`awk 'FNR>2' /tmp/jPtafDiskIO.txt | awk 'BEGIN {FS=OFS=" "}NR == 1 { n1 = $2; n2 = $3; n3 = $4; n4 = $5; n5 = $6; next } { n1 += $2; n2 += $3; n3 += $4; n4 += $5; n5 += $6 } END { print n1","n2","n3","n4","n5 }'` >> diskIO.log
I am almost there, however this (running in the loop) makes being invoked each time from beginning, so I don't get the statistics from interval to interval, but always averages (so each invoke brings me pretty the same output).
I know it sounds complicated, but maybe somebody has an idea? Maybe totally different approach?
Thx.
EDIT:
Sample input (/tmp/jPtafDiskIO.txt):
> Linux 2.6.18-194.el5 (hostname) 08/25/2012
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 0.00 0.00 0.00 35655 59
> sda2 0.00 0.00 0.00 67 272
> sda1 0.00 0.00 0.00 521 274
> sdb 52.53 0.56 569.40 20894989
> 21065384388 sdc 1.90 64.64 10.93
> 2391333384 404432217 sdd 0.00 0.00 0.04
> 17880 1343028
Output diskIO.log:
16:53:12,54.43,65.2,580.37,2412282496,21471160238
Why not use iotop http://guichaz.free.fr/iotop/ ?
dstat might be what you're looking for. It has a lot of things it can report on, with some common ones displayed by default.

Permutation columns without repetition

Can anybody give me some piece of code or algorithm or something else to solve the following problem?
I have several files, each with a different number of columns, like:
$> cat file-1
1 2
$> cat file-2
1 2 3
$> cat file-3
1 2 3 4
I would like to subtract the column absolute values and divide by the sum of all in a row for each different columns only once (combination without repeated column pairs):
in file-1 case I need to get:
0.3333 # because |1-2/(1+2)|
in file-2 case I need to get:
0.1666 0.1666 0.3333 # because |1-2/(1+2+3)| and |2-3/(1+2+3)| and |1-3/(1+2+3)|
in file-3 case I need to get:
0.1 0.2 0.3 0.1 0.2 0.1 # because |1-2/(1+2+3+4)| and |1-3/(1+2+3+4)| and |1-4/(1+2+3+4)| and |2-3/(1+2+3+4)| and |2-4/(1+2+3+4)| and |3-4/(1+2+3+4)|
This should work though I am guessing you have made a minor mistake in your input data. Based on your third pattern the following data should be -
Instead of:
in file-2 case I need to get:
0.1666 0.1666 0.3333 # because |1-2/(1+2+3)| and |2-3/(1+2+3)| and |1-3/(1+2+3)|
It should be:
in file-2 case I need to get:
0.1666 0.3333 0.1666 # because |1-2/(1+2+3)| and |1-3/(1+2+3)| and |2-3/(1+2+3)|
Here is the awk one liner:
awk '
NF{
a=0;
for(i=1;i<=NF;i++)
a+=$i;
for(j=1;j<=NF;j++)
{
for(k=j;k<NF;k++)
printf("%s ",-($j-$(k+1))/a)
}
print "";
next;
}1' file
Short version:
awk '
NF{for (i=1;i<=NF;i++) a+=$i;
for (j=1;j<=NF;j++){for (k=j;k<NF;k++) printf("%2.4f ",-($j-$(k+1))/a)}
print "";a=0;next;}1' file
Input File:
[jaypal:~/Temp] cat file
1 2
1 2 3
1 2 3 4
Test:
[jaypal:~/Temp] awk '
NF{
a=0;
for(i=1;i<=NF;i++)
a+=$i;
for(j=1;j<=NF;j++)
{
for(k=j;k<NF;k++)
printf("%s ",-($j-$(k+1))/a)
}
print "";
next;
}1' file
0.333333
0.166667 0.333333 0.166667
0.1 0.2 0.3 0.1 0.2 0.1
Test from shorter version:
[jaypal:~/Temp] awk '
NF{for (i=1;i<=NF;i++) a+=$i;
for (j=1;j<=NF;j++){for (k=j;k<NF;k++) printf("%2.4f ",-($j-$(k+1))/a)}
print "";a=0;next;}1' file
0.3333
0.1667 0.3333 0.1667
0.1000 0.2000 0.3000 0.1000 0.2000 0.1000
#Jaypal just beat me too it! Here's what I had:
awk '{for (x=1;x<=NF;x++) sum += $x; for (i=1;i<=NF;i++) for (j=2;j<=NF;j++) if (i < j) printf ("%.1f ",-($i-$j)/sum)} END {print ""}' file.txt
Output:
0.1 0.2 0.3 0.1 0.2 0.1
prints to one decimal place.
#Jaypal, Is there a quick way to printf an absolute value? Perhaps like: abs(value) ?
EDIT:
#Jaypal, yes I've tried searching too and couldn't find something simple :-( It seems if ($i < 0) $i = -$i is the way to go. I guess you could use sed to remove any minus signs:
awk '{for (x=1;x<=NF;x++) sum += $x; for (i=1;i<=NF;i++) for (j=2;j<=NF;j++) if (i < j) printf ("%.1f ", ($i-$j)/sum)} {print ""}' file.txt | sed "s%-%%g"
Cheers!
As it looks like a homework, I will act accordingly.
To find the total numbers present in the file, you can use
cat filename | wc -w
Find the first_number by:
cat filename | cut -d " " -f 1
To find the sum in a file:
cat filename | tr " " "+" | bc
Now, that you have the total_nos, use something like:
for i in {seq 1 1 $total_nos}
do
#Find the numerator by first_number - $i
#Use the sum you got from above to get the desired value.
done

Resources