Reverse sort order of a multicolumn file in BASH - linux

I've the following file:
1 2 3
1 4 5
1 6 7
2 3 5
5 2 1
and I want that the file be sorted for the second column but from the largest number (in this case 6) to the smallest. I've tried with
sort +1 -2 file.dat
but it sorts in ascending order (rather than descending).
The results should be:
1 6 7
1 4 5
2 3 5
5 2 1
1 2 3

sort -nrk 2,2
does the trick.
n for numeric sorting, r for reverse order and k 2,2 for the second column.

Have you tried -r ? From the man page:
-r, --reverse
reverse the result of comparisons

As mention most version of sort have the -r option if yours doesn't try tac:
$ sort -nk 2,2 file.dat | tac
1 6 7
1 4 5
2 3 5
5 2 1
1 2 3
$ sort -nrk 2,2 file.dat
1 6 7
1 4 5
2 3 5
5 2 1
1 2 3
tac - concatenate and print files in reverse

Related

File is not sort after sort

I have a problem with sorting my file. My file look like this
geom-10-11.com 1
geom-1-10.com 9
geom-1-11.com 10
geom-1-2.com 1
geom-1-3.com 2
geom-1-4.com 3
geom-1-5.com 4
geom-1-6.com 5
geom-1-7.com 6
geom-1-8.com 7
geom-1-9.com 8
geom-2-10.com 8
geom-2-11.com 9
geom-2-3.com 1
geom-2-4.com 2
geom-2-5.com 3
geom-2-6.com 4
geom-2-7.com 5
geom-2-8.com 6
geom-2-9.com 7
geom-3-10.com 7
geom-3-11.com 8
geom-3-4.com 1
geom-3-5.com 2
geom-3-6.com 3
geom-3-7.com 4
geom-3-8.com 5
geom-3-9.com 6
geom-4-10.com 6
geom-4-11.com 7
geom-4-5.com 1
geom-4-6.com 2
geom-4-7.com 3
geom-4-8.com 4
geom-4-9.com 5
geom-5-10.com 5
geom-5-11.com 6
geom-5-6.com 1
geom-5-7.com 2
geom-5-8.com 3
geom-5-9.com 4
geom-6-10.com 4
geom-6-11.com 5
geom-6-7.com 1
geom-6-8.com 2
geom-6-9.com 3
geom-7-10.com 3
geom-7-11.com 4
geom-7-8.com 1
geom-7-9.com 2
geom-8-10.com 2
geom-8-11.com 3
geom-8-9.com 1
geom-9-10.com 1
geom-9-11.com 2
So I used sort -k1.6 -k2 -n and I got
geom-1-2.com 1
geom-1-3.com 2
geom-1-4.com 3
geom-1-5.com 4
geom-1-6.com 5
geom-1-7.com 6
geom-1-8.com 7
geom-1-9.com 8
geom-1-10.com 9
geom-1-11.com 10
geom-2-3.com 1
geom-2-4.com 2
geom-2-5.com 3
geom-2-6.com 4
geom-2-7.com 5
geom-2-8.com 6
geom-2-9.com 7
geom-2-10.com 8
geom-2-11.com 9
geom-3-4.com 1
geom-3-5.com 2
geom-3-6.com 3
geom-3-7.com 4
geom-3-8.com 5
geom-3-9.com 6
geom-3-10.com 7
geom-3-11.com 8
geom-4-5.com 1
geom-4-6.com 2
geom-4-7.com 3
geom-4-8.com 4
geom-4-9.com 5
geom-4-10.com 6
geom-4-11.com 7
geom-5-6.com 1
geom-5-7.com 2
geom-5-8.com 3
geom-5-9.com 4
geom-5-10.com 5
geom-5-11.com 6
geom-6-7.com 1
geom-6-8.com 2
geom-6-9.com 3
geom-6-10.com 4
geom-6-11.com 5
geom-7-8.com 1
geom-7-9.com 2
geom-7-10.com 3
geom-7-11.com 4
geom-8-9.com 1
geom-8-10.com 2
geom-8-11.com 3
geom-9-10.com 1
geom-9-11.com 2
geom-10-11.com 1
But when I tried use uniq -f1 or sort -k1.6 -k2 -n -u I got same long sorted output. So I used
sort -k1.6 -k2 -n -c
and get message that this file is disordered
(sort: glist2:2: disorder: geom-1-2.com 1).
I tried use just sort -k2 -n -u but got
geom-10-11.com 1
geom-1-3.com 2
geom-1-4.com 3
geom-1-5.com 4
geom-1-6.com 5
geom-1-7.com 6
geom-1-8.com 7
geom-1-9.com 8
geom-1-10.com 9
geom-1-11.com 10
That is not what I need, I need to have
geom-1-2.com 1
geom-1-3.com 2
geom-1-4.com 3
geom-1-5.com 4
geom-1-6.com 5
geom-1-7.com 6
geom-1-8.com 7
geom-1-9.com 8
geom-1-10.com 9
geom-1-11.com 10
So I need to have at begening geom-1-X and not geom-10-X. It would be great use juste uniq because I have more bigger files with more geometries (about thousands of lines) but with same structure. Thank you for your answers.
You can use this:
grep -E '^geom-1-' file | sort -k1.8n
grep is filtering the lines you want. sort is sorting numerically the first field starting at the 8th character.

Split one column by every into "n" columns with one character each

I have a file with one single column and 10 rows. Every column has the same number of characters (5). From this file I would like to get a file with 10 rows and 5 columns, where each column has 1 character only. I have no idea on how to do that in linux.. Any help? Would AWK do this?
The real data has many more rows (>4K) and characters (>500K) though. Here is a short version of the real data:
31313
30442
11020
12324
00140
34223
34221
43124
12211
04312
Desired output:
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2
Thanks!
I think that this does what you want:
$ awk -F '' '{ $1 = $1 }1' file
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2
The input field separator is set to the empty string, so every character is treated as a field. $1 = $1 means that awk "touches" every record, causing it to be reformatted, inserting the output field separator (a space) between every character. 1 is the shortest "true" condition, causing awk to print every record.
Note that the behaviour of setting the field separator to an empty string isn't well-defined, so may not work on your version of awk. You may find that setting the field separator differently e.g. by using -v FS= works for you.
Alternatively, you can do more or less the same thing in Perl:
perl -F -lanE 'say "#F"' file
-a splits each input record into the special array #F. -F followed by nothing sets the input field separator to the empty string. The quotes around #F mean that the list separator (a space by default) is inserted between each element.
You can use this sed as well:
sed 's/./& /g; s/ $//' file
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2
Oddly enough, this isn't trivial to do with most standard Unix tools (update: except, apparently, with awk). I would use Python:
python -c 'import sys; map(sys.stdout.write, map(" ".join, sys.stdin))' in.txt > new.txt
(This isn't the greatest idiomatic Python, but it suffices for a simple one-liner.)
another unix toolchain for this task
$ while read line; do echo $line | fold -w1 | xargs; done < file
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2

Average from different columns in shell script

I have a datafile with 10 columns as given below
ifile.txt
2 4 4 2 1 2 2 4 2 1
3 3 1 5 3 3 4 5 3 3
4 3 3 2 2 1 2 3 4 2
5 3 1 3 1 2 4 5 6 8
I want to add 11th column which will show the average of each rows along 10 columns. i.e. AVE(2 4 4 2 1 2 2 4 2 1) and so on. Though my following script is working well, but I would like to make it more simpler and short. I appreciate, in advance, for any kind help or suggestions in this regard.
awk '{for(i=1;i<=NF;i++){s+=$i;ss+=$i}m=s/NF;$(NF+1)=ss/NF;s=ss=0}1' ifile.txt
This should work
awk '{for(i=1;i<=NF;i++)x+=$i;$(NF+1)=x/NF;x=0}1' file
For each field you add the value to x in the loop.
Next you set field 11 to the sum in xdivided by the number of fields NF.
Reset x to zero for the next line.
1 equates to true and performs the default action in awk which is to print the line.
is this helping
awk '{for(i=1;i<=NF;i++)s+=$i;print $0,s/NF;s=0}' ifile.txt
or
awk '{for(i=1;i<=NF;i++)ss+=$i;$(NF+1)=ss/NF;ss=0}1' ifile.txt

how to calculate standard deviation from different colums in shell script

I have a datafile with 10 columns as given below
ifile.txt
2 4 4 2 1 2 2 4 2 1
3 3 1 5 3 3 4 5 3 3
4 3 3 2 2 1 2 3 4 2
5 3 1 3 1 2 4 5 6 8
I want to add 11th column which will show the standard deviation of each rows along 10 columns. i.e. STDEV(2 4 4 2 1 2 2 4 2 1) and so on.
I am able to do by taking tranpose, then using the following command and again taking transpose
awk '{x[NR]=$0; s+=$1} END{a=s/NR; for (i in x){ss += (x[i]-a)^2} sd = sqrt(ss/NR); print sd}'
Can anybody suggest a simpler way so that I can do it directly along each row.
You can do the same with one pass as well.
awk '{for(i=1;i<=NF;i++){s+=$i;ss+=$i*$i}m=s/NF;$(NF+1)=sqrt(ss/NF-m*m);s=ss=0}1' ifile.txt
Do you mean something like this ?
awk '{for(i=1;i<=NF;i++)s+=$i;M=s/NF;
for(i=1;i<=NF;i++)sd+=(($i-M)^2);$(NF+1)=sqrt(sd/NF);M=sd=s=0}1' file
2 4 4 2 1 2 2 4 2 1 1.11355
3 3 1 5 3 3 4 5 3 3 1.1
4 3 3 2 2 1 2 3 4 2 0.916515
5 3 1 3 1 2 4 5 6 8 2.13542
You just use the fields instead of transposing and using the rows.

Sorting numeric columns based on another numeric column

I have the following file:
BTA Pos KLD
4 79.7011 5.7711028907
4 79.6231 5.7083918219
5 20.9112 4.5559494707
5 50.7354 4.2495580809
5 112.645 4.0936819092
6 72.8212 4.9384741047
6 18.3889 7.3631759258
I want to use AWK or bash commands to sort the second column based on the first column to have the output as follows:
4 79.6231 5.7083918219
4 79.7011 5.7711028907
5 20.9112 4.5559494707
5 50.7354 4.2495580809
5 112.645 4.0936819092
6 18.3889 7.3631759258
6 72.8212 4.9384741047
sort numerically on column one then on column two:
$ sort -nk1,1 -nk2,2 file
BTA POS KLD
4 79.6231 5.7083918219
4 79.7011 5.7711028907
5 20.9112 4.5559494707
5 50.7354 4.2495580809
5 112.645 4.0936819092
6 18.3889 7.3631759258
6 72.8212 4.9384741047

Resources