I am trying to sort a file, in ascending order. The file has both alphabets and numerical values.
aae-miR-1
aae-miR-10
aae-miR-100
aae-miR-1000
aae-miR-11-3p
aae-miR-11-5p
aae-miR-1174
aae-miR-1175-3p
aae-miR-1175-5p
aae-miR-12-3p
aae-miR-124
I want the output as
aae-miR-1
aae-miR-10
aae-miR-11-3p
aae-miR-11-5p
aae-miR-12-3p
aae-miR-100
aae-miR-124
aae-miR-1000
aae-miR-1174
aae-miR-1175-3p
aae-miR-1175-5p`
I used,
sort -k1,1 -n <file>
For sorting, with numeric and alphabetical order, but it is not coming as expected. Please suggest, the use of sort
You should use sort -t"-" -k3n file.txt for this case.
Output received :-
aae-miR-1
aae-miR-10
aae-miR-11-3p
aae-miR-11-5p
aae-miR-12-3p
aae-miR-100
aae-miR-124
aae-miR-1000
aae-miR-1174
aae-miR-1175-3p
aae-miR-1175-5p
This is more explicit. '-t' option is used to provide the
delimiter in case of files with delimiter. '-k' is used to specify the
keys on the basis of which the sorting has to be done. The format of
'-k' is : -km[,n] where m is the starting key and n is the ending key. n is an optional key,used only when required.
Try:
sort -n -t- -k3 <file>
-n will numerically sort.
-t- will use - as field separator.
-k3 will use third field to sort by.
Try this, with separator:
sort -t - -k3n file
Related
I'm trying to sort a text file in the descending order based on the last column. It just doesn't seem to work.
cat 1.txt | sort -r -n -k 4,4
ACHG,89.46,0.08,34200
UUKJL,0.85,-15.00,200
NIMJKY,34.35,0.09,17700
TBBNHW,10.24,0.00,4600
JJkLEYE,73.67,0.48,25400
I've tried removing spaces just in case but, hasn't helped. Also, tried sorting by the other fields just to see but, ahve the same problem.
I just can't work out what is wrong with the command I've issued. Please could I request help with this one?
Your command is almost right but it is missing field separator option -t that should set comma as field separator.
This should work for you:
sort -t, -rnk 4,4 1.txt
ACHG,89.46,0.08,34200
JJkLEYE,73.67,0.48,25400
NIMJKY,34.35,0.09,17700
TBBNHW,10.24,0.00,4600
UUKJL,0.85,-15.00,200
Note that there is no need to use cat | sort here.
I have a file named a.csv. which contains
100008,3
10000,3
100010,5
100010,4
10001,6
100021,7
After running this command sort -k1 -d -t "," a.csv
The result is
10000,3
100008,3
100010,4
100010,5
10001,6
100021,7
Which is unexpected because 10001 should come first than 100010
Trying to understand why this happened from long time. but couldn't get any answers.
$ sort --version
sort (GNU coreutils) 8.13
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and Paul Eggert.
Some of the other responses have assumed this is a numeric sort vs dictionary sort problem. It isn't, as even sorting alphabetically the output given in the question is incorrect.
The answer
To get the correct sorting, you need to change -k1 to -k1,1:
$ sort -k1,1 -d -t "," a.csv
10000,3
100008,3
10001,6
100010,4
100010,5
100021,7
The reason
The -k option takes two numbers, the start and end fields to sort (i.e. -ks,e where s is the start and e is the end). By default, the end field is the end of the line. Hence, -k1 is the same as not giving the -k option at all. To show this, compare:
$ printf "1,a,1\n2,aa,2\n" | sort -k2 -t,
1,a,1
2,aa,2
with:
$ printf "1~a~1\n2~aa~2\n" | sort -k2 -t~
2~aa~2
1~a~1
The first sorts a,1 before aa,2, while the second sorts aa~2 before a~1 since, in ASCII, , < a < ~.
To get the desired behaviour, therefore, we need to sort only one field. In your case, that means using 1 as both the start and end field, so you specify -k1,1. If you try the two examples above with -k2,2 instead of -k2, you'll find you get the same (correct) ordering in both cases.
Many thanks to Eric and Assaf from the coreutils mailing list for pointing this out.
You have not found a bug in sort. Your usage bug is that you used '-k1' ("set the key to the first field through the end of the line") instead of '-k1,1' ("set the key to use only the first field"). If you use GNU sort, the --debug option will show you the difference. The delimiter is included in the key as long as the key extends beyond a single field.
It sorts alphabetically, not numerically, so "," is before "0", i.e. more like a dictionary
The -d option is for --dictionary-order:
-d, --dictionary-order
consider only blanks and alphanumeric characters
But I think you want to use -n (--numeric-sort) instead:
-n, --numeric-sort
compare according to string numerical value
So, change your command to look like this:
sort -k1 -n -t "," a.csv
http://man7.org/linux/man-pages/man1/sort.1.html
The sort is alphabetical, not numerical. Replace -d by -n in your option list to sort numerically.
The file looks like
5.1,3.5,1.4,0.2,Banana
4.9,3.0,1.4,0.6,Apple
4.8,2.8,1.3,1.2,Apple
and I need to have it be
4.9,3.0,1.4,0.2,Apple
4.8,2.8,1.3,1.2,Apple
5.1,3.5,1.4,0.2,Banana
I have been trying to use
sort -t, -k5 file.csv > sorted.csv
All it does is make it
5.1,3.5,1.4,0.2,Banana
4.8,2.8,1.3,1.2,Apple
4.9,3.0,1.4,0.6,Apple
How do I make it like this? It does not seem to be sorting it at all.
GNU sort is locale sensitive, which can cause weirdness. Try the following and see if it makes a difference:
LC_ALL=C sort -t, -k5 file.csv > sorted.csv
Is this what you need to have it be
# sort -t . -nrk2 sorted.csv
4.9,3.0,1.4,0.6,Apple
4.8,2.8,1.3,1.2,Apple
5.1,3.5,1.4,0.2,Banana
I can't seem to sort the following data as I would like;
find output/ -type f -name *.raw | sort
output/rtp.0.0.raw
output/rtp.0.10.raw
output/rtp.0.11.raw
output/rtp.0.12.raw
output/rtp.0.13.raw
output/rtp.0.14.raw
output/rtp.0.15.raw
output/rtp.0.16.raw
output/rtp.0.17.raw
output/rtp.0.18.raw
output/rtp.0.19.raw
output/rtp.0.1.raw
output/rtp.0.20.raw
output/rtp.0.2.raw
output/rtp.0.3.raw
output/rtp.0.4.raw
output/rtp.0.5.raw
output/rtp.0.6.raw
output/rtp.0.7.raw
output/rtp.0.8.raw
output/rtp.0.9.raw
In the above example I haven't passed any arguments to the sort command. No matter what options I used I can't get closer to my desired results. I would like the following output;
find output/ -type f -name *.raw | sort
output/rtp.0.0.raw
output/rtp.0.1.raw
output/rtp.0.2.raw
output/rtp.0.3.raw
output/rtp.0.4.raw
output/rtp.0.5.raw
output/rtp.0.6.raw
output/rtp.0.7.raw
output/rtp.0.8.raw
output/rtp.0.9.raw
output/rtp.0.10.raw
output/rtp.0.11.raw
output/rtp.0.12.raw
output/rtp.0.13.raw
output/rtp.0.14.raw
output/rtp.0.15.raw
output/rtp.0.16.raw
output/rtp.0.17.raw
output/rtp.0.18.raw
output/rtp.0.19.raw
output/rtp.0.20.raw
I have tried with -t . option to set a field separator to the full stop. Also I have experimented with the -k option to specify the field, and -g, -h, -n, but none of the options are helping. I can't see anything else in the man pages that would do as I require, unless I haven't understood the man pages correctly and overlooked my answer.
Can I produce the results I require with sort, and if so, how?
Additionally, it's vary rare but sometimes the 2nd column which shows as '0' all the way down may increment. Can that be factored into the sort?
This makes it:
$ sort -t'.' -n -k3 a
output/rtp.0.0.raw
output/rtp.0.1.raw
output/rtp.0.2.raw
output/rtp.0.3.raw
output/rtp.0.4.raw
output/rtp.0.5.raw
output/rtp.0.6.raw
output/rtp.0.7.raw
output/rtp.0.8.raw
output/rtp.0.9.raw
output/rtp.0.10.raw
output/rtp.0.11.raw
output/rtp.0.12.raw
output/rtp.0.13.raw
output/rtp.0.14.raw
output/rtp.0.15.raw
output/rtp.0.16.raw
output/rtp.0.17.raw
output/rtp.0.18.raw
output/rtp.0.19.raw
output/rtp.0.20.raw
As you see we need different options:
-t'.' to set the dot . as the field separator.
-n to make it numeric sort.
-k3 to check the 3rd column.
Update
This also makes it:
$ sort -t'.' -V -k2 a
output/rtp.0.0.raw
output/rtp.0.1.raw
output/rtp.0.2.raw
output/rtp.0.3.raw
output/rtp.0.4.raw
output/rtp.0.5.raw
output/rtp.0.6.raw
output/rtp.0.7.raw
output/rtp.0.8.raw
output/rtp.0.9.raw
output/rtp.0.10.raw
output/rtp.0.11.raw
output/rtp.0.12.raw
output/rtp.0.13.raw
output/rtp.0.14.raw
output/rtp.0.15.raw
output/rtp.0.16.raw
output/rtp.0.17.raw
output/rtp.0.18.raw
output/rtp.0.19.raw
output/rtp.0.20.raw
As you see we need different options:
-t'.' to set the dot . as the field separator.
-V to make it sort based on version.
-k2 to check the 2nd column.
Fedorqui's solution is good (+1), but not all versions of sort support -V. For those versions that do not, you need to do a little more work than fedorqui's original solution. This should suffice:
sort -t. -k2,2 -k3,3 -n
You get a slightly different sort (eg, '05' sorts before '1' instead of after) if you use:
sort -t. -k2g
(Note that -g is also non-standard, and not available in all versions of sort).
sort -t'.' -n -k3
to sort the 3th column from smallest to biggest
And if you want to sort from biggest to smallest,
you can use '-r' option :
sort -t'.' -n -r -k3
I have a list like this -
2009-96 2010-100 2010-101 2010-97 2010-98 2010-99 2009-99a 2011-102
How do I sort the numbers in the right order, so that it's sorted by first 4 digits (year) if the year is different, otherwise it is sorted by the digit after -?
The right output which I want is -
2009-96 2009-99a 2010-97 2010-98 2010-99 2010-100 2010-101 2011-102
It depends on your version of sort, because the command line options may be different, but on my system, sort -t - -k 1,1n -k 2,2n <filename> works.
With GNU sort (std on Linux):
sort -t'-' -n
sort sorts lines, so convert your space delimiters to \n and back using tr as shown in #dimba's answer.