Linux shell sort file according to the second column? - linux

I have a file like this:
FirstName, FamilyName, Address, PhoneNumber
How can I sort it by FamilyName?

If this is UNIX:
sort -k 2 file.txt
You can use multiple -k flags to sort on more than one column. For example, to sort by family name then first name as a tie breaker:
sort -k 2,2 -k 1,1 file.txt
Relevant options from "man sort":
-k, --key=POS1[,POS2]
start a key at POS1, end it at POS2 (origin 1)
POS is F[.C][OPTS], where F is the field number and C the character position in the field. OPTS is one or more single-letter ordering options, which override global ordering options for that key. If no key is given, use the entire line as the key.
-t, --field-separator=SEP
use SEP instead of non-blank to blank transition

sort -nk2 file.txt
Accordingly you can change column number.

To sort by second field only (thus where second fields match, those lines with matches remain in the order they are in the original without sorting on other fields) :
sort -k 2,2 -s orig_file > sorted_file

FWIW, here is a sort method for showing which processes are using the most virt memory.
memstat | sort -k 1 -t':' -g -r | less
Sort options are set to first column, using : as column seperator, numeric sort and sort in reverse.

Related

Shell | Sort Date and Month in Ascending order

I wanted to display/sort the file records in Ascending order of Date and Month or if there are any equal data values they should list in the very next column in ascending order.
Date & Month to sort: (current scenario)
ver.....03.02../ver>
ver.....19.01../ver>
ver.....02.02..ver>
File content:
ver>0.1.1-ABC-XYA-BR-03.02-v1.0-1-4d4f3dd/ver>
ver>0.1.1-XYZ-LOK-BR-19.01-v1.0-5-8a8d7dd/ver>
ver>0.1.1-DXD-UIJ-BR-02.02-v1.0-4-9o2k4wk/ver>
How would I can achieve below following results?
ver>0.1.1-XYZ-LOK-BR-19.01-v1.0-5-8a8d7dd/ver>
ver>0.1.1-DXD-UIJ-BR-02.02-v1.0-4-9o2k4wk/ver>
ver>0.1.1-ABC-XYA-BR-03.02-v1.0-1-4d4f3dd/ver>
I tried using sort: (not working)
sort -n sortfile.txt
ver>0.1.1-DXD-UIJ-BR-02.02-v1.0-4-9o2k4wk/ver>
ver>0.1.1-ABC-XYA-BR-03.02-v1.0-1-4d4f3dd/ver>
ver>0.1.1-XYZ-LOK-BR-19.01-v1.0-5-8a8d7dd/ver>
You can use sort, but you will need to specify the field-seperator -t '-' so that fields are separated by '-' and then specify the keydef to sort on the 5th field beginning with the 4th character and then again with the 1st character and finally a version sort on field 6 if all else is equal. That would be:
sort -t '-' -k5.4n -k5.1n -k6V contents
Providing full start and stop characters within each keydef can be done as:
sort -t '-' -k5.4n,5.5 -k5.1n,5.2 -k6V contents
(though for this data the output isn't changed)
Example Use/Output
$ sort -t '-' -k5.4n -k5.1n -k6V contents
ver>0.1.1-XYZ-LOK-BR-19.01-v1.0-5-8a8d7dd/ver>
ver>0.1.1-DXD-UIJ-BR-02.02-v1.0-4-9o2k4wk/ver>
ver>0.1.1-ABC-XYA-BR-03.02-v1.0-1-4d4f3dd/ver>

Linux sort numerically based on first column

I'm trying to numerically sort a long list of csv file based on the number in the first column, using below command:
-> head -1 file.csv ; tail -n +2 file.csv | sort -t , -k1n
(I'm piping head/tail command to skip the first line of the file, as it's a header and contains string)
However, it doesn't return a fully sorted list. Half of it is sorted, the other half is like this:
9838,2361,8,947,2284
9842,2135,2,261,2511
9846,2710,1,176,2171
986,2689,32,123,2177
9888,2183,15,30,2790
989,2470,33,887,2345
Can somebody tell me what I'm doing wrong? I've also tried below with same result:
-> sort -k1n -t"," file.csv
tail -n +2 file.csv | sort -k1,2 -n -t"," should do the trick.
To perform a numeric sort by the first column use the following approach:
tail -n +2 /file.csv | sort -n -t, -k1,1
The output:
986,2689,32,123,2177
989,2470,33,887,2345
9838,2361,8,947,2284
9842,2135,2,261,2511
9846,2710,1,176,2171
9888,2183,15,30,2790
-k pos1[,pos2]
Specify a sort field that consists of the part of the line between pos1 and pos2
(or the end of the line, if pos2 is omitted), inclusive.
In its simplest form pos specifies a field number (starting with 1) ...

Alphanumeric sorting of a string with variable width size

I am stuck in a small sorting step. I have a huge file with >300K entries and the file has to be sorted on a specific column containing alphanumeric identifiers as
Rpl12-8
Lrsam1-1
Rpl12-9
Lrsam1-2
Rpl12-10
Lrsam1-5
Rpl12-11
Lrsam1-101
Lrsam2-1
Act-1
Act-100
Act-101
Act-11
The problem is the variable width size, so I am unable to specify the second key identifier (sort -k 1.8n).The first sort is on first alphabet, then on number next to it and then the third number after "-". Can I specifically enable sorting after "-" using delimiter field so then I don't care about width of string.
Desired output would be :
Act-1
Act-11
Act-100
Act-101
Lrsam1-1
Lrsam1-2
Lrsam1-5
Lrsam1-101
Lrsam2-1
Rpl12-8
Rpl12-9
Rpl12-10
Rpl12-11
With the above data in input.txt:
sort -t- -k1,1 -k2n input.txt
You can change the field delimiter to - with -t, then sort on the first field only (as a string) with -k1,1, and finally the 2nd field (as a number) with -k2n.

I have a file having some columns. I would like to do sort for column 2 by grouping column 1 values

I have a file having some columns. I would like to do sort for column 2 by grouping column 1 values.
See below example.
Input File like:
NEW,RED,1
OLD,BLUE,2
NEW,BLUE,3
OLD,GREEN,4
Expected output file:
NEW,BLUE,3
NEW,RED,1
OLD,BLUE,2
OLD,GREEN,4
How can i achieve this,please help. Thanks in advance!
$ sort -t, -k1,2 inputfile
NEW,BLUE,3
NEW,RED,1
OLD,BLUE,2
OLD,GREEN,4
-t is used to specify the field separator, and -k1 to specify the starting/ending key positions.

Why linux sort is not giving me desired results?

I have a file a.csv with contents similar to below
a,b,c
a ,aa, a
a b, c, f
a , b, c
a b a b a,a,a
a,a,a
a aa ,a , t
I am trying to sort it by using sort -k1 -t, a.csv
But it is giving following results
a,a,a
a ,aa, a
a aa ,a , t
a b a b a,a,a
a , b, c
a,b,c
a b, c, f
Which is not the actual sort on 1st column. What am I doing wrong?
You have to specify the end position to be 1, too:
sort -k1,1 -t, a.csv
Give this a try: sort -t, -k1,1 a.csv
The man suggests that omitting the end field, it will sort on all characters starting at field n until the end of the line:
-k POS1[,POS2]'
The recommended, POSIX, option for specifying a sort field. The
field consists of the part of the line between POS1 and POS2 (or
the end of the line, if POS2 is omitted), _inclusive_. Fields and
character positions are numbered starting with 1. So to sort on
the second field, you'd use `-k 2,2' See below for more examples.
Try this instead:
sort -k 1,1 -t , a.csv
sort reads -k 1 as "sort from first field onwards" -- thus effectively defying the point of passing the argument in the first place.
This is documented in the sort man page and warned about in the Examples section:
Sort numerically on the second field
and resolve ties by sorting
alphabetically on the third and fourth
characters of field five. Use `:' as
the field delimiter:
$ sort -t : -k 2,2n -k 5.3,5.4
Note that if you had written -k 2 instead
of -k 2,2, sort would have used all
characters beginning in the second
field and extending to the end of the
line as the primary numeric key. For
the large majority of applications,
treating keys spanning more than one
field as numeric will not do what you
expect.

Resources