Average from different columns in shell script

Average from different columns in shell script - linux

I have a datafile with 10 columns as given below
ifile.txt
2 4 4 2 1 2 2 4 2 1
3 3 1 5 3 3 4 5 3 3
4 3 3 2 2 1 2 3 4 2
5 3 1 3 1 2 4 5 6 8
I want to add 11th column which will show the average of each rows along 10 columns. i.e. AVE(2 4 4 2 1 2 2 4 2 1) and so on. Though my following script is working well, but I would like to make it more simpler and short. I appreciate, in advance, for any kind help or suggestions in this regard.
awk '{for(i=1;i<=NF;i++){s+=$i;ss+=$i}m=s/NF;$(NF+1)=ss/NF;s=ss=0}1' ifile.txt

This should work
awk '{for(i=1;i<=NF;i++)x+=$i;$(NF+1)=x/NF;x=0}1' file
For each field you add the value to x in the loop.
Next you set field 11 to the sum in xdivided by the number of fields NF.
Reset x to zero for the next line.
1 equates to true and performs the default action in awk which is to print the line.

is this helping
awk '{for(i=1;i<=NF;i++)s+=$i;print $0,s/NF;s=0}' ifile.txt
or
awk '{for(i=1;i<=NF;i++)ss+=$i;$(NF+1)=ss/NF;ss=0}1' ifile.txt

Related

Split one column by every into "n" columns with one character each

I have a file with one single column and 10 rows. Every column has the same number of characters (5). From this file I would like to get a file with 10 rows and 5 columns, where each column has 1 character only. I have no idea on how to do that in linux.. Any help? Would AWK do this?
The real data has many more rows (>4K) and characters (>500K) though. Here is a short version of the real data:
31313
30442
11020
12324
00140
34223
34221
43124
12211
04312
Desired output:
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2
Thanks!

I think that this does what you want:
$ awk -F '' '{ $1 = $1 }1' file
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2
The input field separator is set to the empty string, so every character is treated as a field. $1 = $1 means that awk "touches" every record, causing it to be reformatted, inserting the output field separator (a space) between every character. 1 is the shortest "true" condition, causing awk to print every record.
Note that the behaviour of setting the field separator to an empty string isn't well-defined, so may not work on your version of awk. You may find that setting the field separator differently e.g. by using -v FS= works for you.
Alternatively, you can do more or less the same thing in Perl:
perl -F -lanE 'say "#F"' file
-a splits each input record into the special array #F. -F followed by nothing sets the input field separator to the empty string. The quotes around #F mean that the list separator (a space by default) is inserted between each element.

You can use this sed as well:
sed 's/./& /g; s/ $//' file
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2

Oddly enough, this isn't trivial to do with most standard Unix tools (update: except, apparently, with awk). I would use Python:
python -c 'import sys; map(sys.stdout.write, map(" ".join, sys.stdin))' in.txt > new.txt
(This isn't the greatest idiomatic Python, but it suffices for a simple one-liner.)

another unix toolchain for this task
$ while read line; do echo $line | fold -w1 | xargs; done < file
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2

how to calculate standard deviation from different colums in shell script

I have a datafile with 10 columns as given below
ifile.txt
2 4 4 2 1 2 2 4 2 1
3 3 1 5 3 3 4 5 3 3
4 3 3 2 2 1 2 3 4 2
5 3 1 3 1 2 4 5 6 8
I want to add 11th column which will show the standard deviation of each rows along 10 columns. i.e. STDEV(2 4 4 2 1 2 2 4 2 1) and so on.
I am able to do by taking tranpose, then using the following command and again taking transpose
awk '{x[NR]=$0; s+=$1} END{a=s/NR; for (i in x){ss += (x[i]-a)^2} sd = sqrt(ss/NR); print sd}'
Can anybody suggest a simpler way so that I can do it directly along each row.

You can do the same with one pass as well.
awk '{for(i=1;i<=NF;i++){s+=$i;ss+=$i*$i}m=s/NF;$(NF+1)=sqrt(ss/NF-m*m);s=ss=0}1' ifile.txt

Do you mean something like this ?
awk '{for(i=1;i<=NF;i++)s+=$i;M=s/NF;
for(i=1;i<=NF;i++)sd+=(($i-M)^2);$(NF+1)=sqrt(sd/NF);M=sd=s=0}1' file
2 4 4 2 1 2 2 4 2 1 1.11355
3 3 1 5 3 3 4 5 3 3 1.1
4 3 3 2 2 1 2 3 4 2 0.916515
5 3 1 3 1 2 4 5 6 8 2.13542
You just use the fields instead of transposing and using the rows.

Data fill in specific pattern

I am trying to fill data in MS Excel. I am given following pattern:
1 2
1
1
2 5
2 5
2
3
3 6
3
4
4
5 4
And I want my output in following format:
1 2
1 2
1 2
2 5
2 5
2 5
3 6
3 6
3 6
4
4
5 4
I tried using if(b2,b2,c1) in column 3. but that doesn't solve the problem for a=3 and a=4.
Any idea how to do this in Excel?

With sorting thus:
(the effect of which in this case is merely to move 6 up once cell) and a blank row above:
=IF(AND(A2<>A1,B2=""),"",IF(B2<>"",B2,C1))
In C2 and copied down should get the result you ask for from the data sample provided.

Sorting numeric columns based on another numeric column

I have the following file:
BTA Pos KLD
4 79.7011 5.7711028907
4 79.6231 5.7083918219
5 20.9112 4.5559494707
5 50.7354 4.2495580809
5 112.645 4.0936819092
6 72.8212 4.9384741047
6 18.3889 7.3631759258
I want to use AWK or bash commands to sort the second column based on the first column to have the output as follows:
4 79.6231 5.7083918219
4 79.7011 5.7711028907
5 20.9112 4.5559494707
5 50.7354 4.2495580809
5 112.645 4.0936819092
6 18.3889 7.3631759258
6 72.8212 4.9384741047

sort numerically on column one then on column two:
$ sort -nk1,1 -nk2,2 file
BTA POS KLD
4 79.6231 5.7083918219
4 79.7011 5.7711028907
5 20.9112 4.5559494707
5 50.7354 4.2495580809
5 112.645 4.0936819092
6 18.3889 7.3631759258
6 72.8212 4.9384741047

Reverse sort order of a multicolumn file in BASH

I've the following file:
1 2 3
1 4 5
1 6 7
2 3 5
5 2 1
and I want that the file be sorted for the second column but from the largest number (in this case 6) to the smallest. I've tried with
sort +1 -2 file.dat
but it sorts in ascending order (rather than descending).
The results should be:
1 6 7
1 4 5
2 3 5
5 2 1
1 2 3

sort -nrk 2,2
does the trick.
n for numeric sorting, r for reverse order and k 2,2 for the second column.

Have you tried -r ? From the man page:
-r, --reverse
reverse the result of comparisons

As mention most version of sort have the -r option if yours doesn't try tac:
$ sort -nk 2,2 file.dat | tac
1 6 7
1 4 5
2 3 5
5 2 1
1 2 3
$ sort -nrk 2,2 file.dat
1 6 7
1 4 5
2 3 5
5 2 1
1 2 3
tac - concatenate and print files in reverse

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Average from different columns in shell script - linux

is this helping awk '{for(i=1;i<=NF;i++)s+=$i;print $0,s/NF;s=0}' ifile.txt or awk '{for(i=1;i<=NF;i++)ss+=$i;$(NF+1)=ss/NF;ss=0}1' ifile.txt

Related

Split one column by every into "n" columns with one character each

how to calculate standard deviation from different colums in shell script

Data fill in specific pattern

Sorting numeric columns based on another numeric column

Reverse sort order of a multicolumn file in BASH

Categories

Resources