I have a little issue right now.
I have a file with 4 columns
test0000002,10030010330,c_,218
test0000002,10030010330,d_,202
test0000002,10030010330,b_,193
test0000002,10030010020,c_,178
test0000002,10030010020,b_,170
test0000002,10030010330,a_,166
test0000002,10030010020,a_,151
test0000002,10030010020,d_,150
test0000002,10030070050,c_,119
test0000002,10030070050,b_,99
test0000002,10030070050,d_,79
test0000002,10030070050,a_,56
test0000002,10030010390,c_,55
test0000002,10030010390,b_,44
test0000002,10030010380,d_,41
test0000002,10030010380,a_,37
test0000002,10030010390,d_,35
test0000002,10030010380,c_,33
test0000002,10030010390,a_,31
test0000002,10030010320,c_,30
test0000002,10030010320,b_,27
test0000002,10030010380,b_,26
test0000002,10030010320,a_,23
test0000002,10030010320,d_,22
test0000002,10030010010,a_,6
and I want the highest value from 4th column sorted from 2nd column.
test0000002,10030010330,c_,218
test0000002,10030010020,c_,178
test0000002,10030010330,a_,166
test0000002,10030010020,a_,151
test0000002,10030070050,c_,119
test0000002,10030010390,c_,55
test0000002,10030010380,d_,41
test0000002,10030010320,c_,30
test0000002,10030010390,a_,31
test0000002,10030010380,c_,33
test0000002,10030010390,d_,35
test0000002,10030010320,a_,23
test0000002,10030010380,b_,26
test0000002,10030010010,a_,6
It appears that your file is already sorted in descending order on the 4th column, so you just need to print lines where the 2nd column appears for the first time:
awk -F, '!seen[$2]++' file
test0000002,10030010330,c_,218
test0000002,10030010020,c_,178
test0000002,10030070050,c_,119
test0000002,10030010390,c_,55
test0000002,10030010380,d_,41
test0000002,10030010320,c_,30
test0000002,10030010010,a_,6
If your input file is not sorted on column 4, then
sort -t, -k4nr file | awk -F, '!seen[$2]++'
You can use two sorts:
sort -u -t, -k2,2 file | sort -t, -rnk4
The first one removes duplicates in the second column, the second one sorts the first one on the 4th column.
I'm trying to numerically sort a long list of csv file based on the number in the first column, using below command:
-> head -1 file.csv ; tail -n +2 file.csv | sort -t , -k1n
(I'm piping head/tail command to skip the first line of the file, as it's a header and contains string)
However, it doesn't return a fully sorted list. Half of it is sorted, the other half is like this:
9838,2361,8,947,2284
9842,2135,2,261,2511
9846,2710,1,176,2171
986,2689,32,123,2177
9888,2183,15,30,2790
989,2470,33,887,2345
Can somebody tell me what I'm doing wrong? I've also tried below with same result:
-> sort -k1n -t"," file.csv
tail -n +2 file.csv | sort -k1,2 -n -t"," should do the trick.
To perform a numeric sort by the first column use the following approach:
tail -n +2 /file.csv | sort -n -t, -k1,1
The output:
986,2689,32,123,2177
989,2470,33,887,2345
9838,2361,8,947,2284
9842,2135,2,261,2511
9846,2710,1,176,2171
9888,2183,15,30,2790
-k pos1[,pos2]
Specify a sort field that consists of the part of the line between pos1 and pos2
(or the end of the line, if pos2 is omitted), inclusive.
In its simplest form pos specifies a field number (starting with 1) ...
I have a tab-delimited file with three columns like this:
joe W 4
bob A 1
ana F 1
roy J 3
sam S 0
don R 2
tim L 0
cyb M 0
I want to sort this file by decreasing values in the third column, but to break ties I do not want to use some other column to do so (i.e. not use the first column to sort rows with the same entry in the third column).
Instead, I want rows with the same third column entries to either preserve the original order, or be sorted randomly.
Is there a way to do this using the sort command in unix?
sort -k3 -r -s file
This should give you the required output.
-k3 denotes the 3rd column and -r will sort in decreasing order and -s will disable the breaking of ties using other options.
I have a file having some columns. I would like to do sort for column 2 by grouping column 1 values.
See below example.
Input File like:
NEW,RED,1
OLD,BLUE,2
NEW,BLUE,3
OLD,GREEN,4
Expected output file:
NEW,BLUE,3
NEW,RED,1
OLD,BLUE,2
OLD,GREEN,4
How can i achieve this,please help. Thanks in advance!
$ sort -t, -k1,2 inputfile
NEW,BLUE,3
NEW,RED,1
OLD,BLUE,2
OLD,GREEN,4
-t is used to specify the field separator, and -k1 to specify the starting/ending key positions.
I have a file like this:
FirstName, FamilyName, Address, PhoneNumber
How can I sort it by FamilyName?
If this is UNIX:
sort -k 2 file.txt
You can use multiple -k flags to sort on more than one column. For example, to sort by family name then first name as a tie breaker:
sort -k 2,2 -k 1,1 file.txt
Relevant options from "man sort":
-k, --key=POS1[,POS2]
start a key at POS1, end it at POS2 (origin 1)
POS is F[.C][OPTS], where F is the field number and C the character position in the field. OPTS is one or more single-letter ordering options, which override global ordering options for that key. If no key is given, use the entire line as the key.
-t, --field-separator=SEP
use SEP instead of non-blank to blank transition
sort -nk2 file.txt
Accordingly you can change column number.
To sort by second field only (thus where second fields match, those lines with matches remain in the order they are in the original without sorting on other fields) :
sort -k 2,2 -s orig_file > sorted_file
FWIW, here is a sort method for showing which processes are using the most virt memory.
memstat | sort -k 1 -t':' -g -r | less
Sort options are set to first column, using : as column seperator, numeric sort and sort in reverse.