Average from different columns in shell script - linux

I have a datafile with 10 columns as given below
ifile.txt
2 4 4 2 1 2 2 4 2 1
3 3 1 5 3 3 4 5 3 3
4 3 3 2 2 1 2 3 4 2
5 3 1 3 1 2 4 5 6 8
I want to add 11th column which will show the average of each rows along 10 columns. i.e. AVE(2 4 4 2 1 2 2 4 2 1) and so on. Though my following script is working well, but I would like to make it more simpler and short. I appreciate, in advance, for any kind help or suggestions in this regard.
awk '{for(i=1;i<=NF;i++){s+=$i;ss+=$i}m=s/NF;$(NF+1)=ss/NF;s=ss=0}1' ifile.txt

This should work
awk '{for(i=1;i<=NF;i++)x+=$i;$(NF+1)=x/NF;x=0}1' file
For each field you add the value to x in the loop.
Next you set field 11 to the sum in xdivided by the number of fields NF.
Reset x to zero for the next line.
1 equates to true and performs the default action in awk which is to print the line.

is this helping
awk '{for(i=1;i<=NF;i++)s+=$i;print $0,s/NF;s=0}' ifile.txt
or
awk '{for(i=1;i<=NF;i++)ss+=$i;$(NF+1)=ss/NF;ss=0}1' ifile.txt

Related

Split one column by every into "n" columns with one character each

I have a file with one single column and 10 rows. Every column has the same number of characters (5). From this file I would like to get a file with 10 rows and 5 columns, where each column has 1 character only. I have no idea on how to do that in linux.. Any help? Would AWK do this?
The real data has many more rows (>4K) and characters (>500K) though. Here is a short version of the real data:
31313
30442
11020
12324
00140
34223
34221
43124
12211
04312
Desired output:
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2
Thanks!
I think that this does what you want:
$ awk -F '' '{ $1 = $1 }1' file
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2
The input field separator is set to the empty string, so every character is treated as a field. $1 = $1 means that awk "touches" every record, causing it to be reformatted, inserting the output field separator (a space) between every character. 1 is the shortest "true" condition, causing awk to print every record.
Note that the behaviour of setting the field separator to an empty string isn't well-defined, so may not work on your version of awk. You may find that setting the field separator differently e.g. by using -v FS= works for you.
Alternatively, you can do more or less the same thing in Perl:
perl -F -lanE 'say "#F"' file
-a splits each input record into the special array #F. -F followed by nothing sets the input field separator to the empty string. The quotes around #F mean that the list separator (a space by default) is inserted between each element.
You can use this sed as well:
sed 's/./& /g; s/ $//' file
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2
Oddly enough, this isn't trivial to do with most standard Unix tools (update: except, apparently, with awk). I would use Python:
python -c 'import sys; map(sys.stdout.write, map(" ".join, sys.stdin))' in.txt > new.txt
(This isn't the greatest idiomatic Python, but it suffices for a simple one-liner.)
another unix toolchain for this task
$ while read line; do echo $line | fold -w1 | xargs; done < file
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2

how to calculate standard deviation from different colums in shell script

I have a datafile with 10 columns as given below
ifile.txt
2 4 4 2 1 2 2 4 2 1
3 3 1 5 3 3 4 5 3 3
4 3 3 2 2 1 2 3 4 2
5 3 1 3 1 2 4 5 6 8
I want to add 11th column which will show the standard deviation of each rows along 10 columns. i.e. STDEV(2 4 4 2 1 2 2 4 2 1) and so on.
I am able to do by taking tranpose, then using the following command and again taking transpose
awk '{x[NR]=$0; s+=$1} END{a=s/NR; for (i in x){ss += (x[i]-a)^2} sd = sqrt(ss/NR); print sd}'
Can anybody suggest a simpler way so that I can do it directly along each row.
You can do the same with one pass as well.
awk '{for(i=1;i<=NF;i++){s+=$i;ss+=$i*$i}m=s/NF;$(NF+1)=sqrt(ss/NF-m*m);s=ss=0}1' ifile.txt
Do you mean something like this ?
awk '{for(i=1;i<=NF;i++)s+=$i;M=s/NF;
for(i=1;i<=NF;i++)sd+=(($i-M)^2);$(NF+1)=sqrt(sd/NF);M=sd=s=0}1' file
2 4 4 2 1 2 2 4 2 1 1.11355
3 3 1 5 3 3 4 5 3 3 1.1
4 3 3 2 2 1 2 3 4 2 0.916515
5 3 1 3 1 2 4 5 6 8 2.13542
You just use the fields instead of transposing and using the rows.

Data fill in specific pattern

I am trying to fill data in MS Excel. I am given following pattern:
1 2
1
1
2 5
2 5
2
3
3 6
3
4
4
5 4
And I want my output in following format:
1 2
1 2
1 2
2 5
2 5
2 5
3 6
3 6
3 6
4
4
5 4
I tried using if(b2,b2,c1) in column 3. but that doesn't solve the problem for a=3 and a=4.
Any idea how to do this in Excel?
With sorting thus:
(the effect of which in this case is merely to move 6 up once cell) and a blank row above:
=IF(AND(A2<>A1,B2=""),"",IF(B2<>"",B2,C1))
In C2 and copied down should get the result you ask for from the data sample provided.

Sorting numeric columns based on another numeric column

I have the following file:
BTA Pos KLD
4 79.7011 5.7711028907
4 79.6231 5.7083918219
5 20.9112 4.5559494707
5 50.7354 4.2495580809
5 112.645 4.0936819092
6 72.8212 4.9384741047
6 18.3889 7.3631759258
I want to use AWK or bash commands to sort the second column based on the first column to have the output as follows:
4 79.6231 5.7083918219
4 79.7011 5.7711028907
5 20.9112 4.5559494707
5 50.7354 4.2495580809
5 112.645 4.0936819092
6 18.3889 7.3631759258
6 72.8212 4.9384741047
sort numerically on column one then on column two:
$ sort -nk1,1 -nk2,2 file
BTA POS KLD
4 79.6231 5.7083918219
4 79.7011 5.7711028907
5 20.9112 4.5559494707
5 50.7354 4.2495580809
5 112.645 4.0936819092
6 18.3889 7.3631759258
6 72.8212 4.9384741047

Reverse sort order of a multicolumn file in BASH

I've the following file:
1 2 3
1 4 5
1 6 7
2 3 5
5 2 1
and I want that the file be sorted for the second column but from the largest number (in this case 6) to the smallest. I've tried with
sort +1 -2 file.dat
but it sorts in ascending order (rather than descending).
The results should be:
1 6 7
1 4 5
2 3 5
5 2 1
1 2 3
sort -nrk 2,2
does the trick.
n for numeric sorting, r for reverse order and k 2,2 for the second column.
Have you tried -r ? From the man page:
-r, --reverse
reverse the result of comparisons
As mention most version of sort have the -r option if yours doesn't try tac:
$ sort -nk 2,2 file.dat | tac
1 6 7
1 4 5
2 3 5
5 2 1
1 2 3
$ sort -nrk 2,2 file.dat
1 6 7
1 4 5
2 3 5
5 2 1
1 2 3
tac - concatenate and print files in reverse

Resources