sort alphanumerically with priority for numbers in linux [duplicate] - linux

This question already has answers here:
How to combine ascending and descending sorting?
(3 answers)
How to sort strings that contain a common prefix and suffix numerically from Bash?
(5 answers)
Closed 4 years ago.
I want to sort a file alphanumerically but with priority for the numbers in each file entry. Example: File is:
22 FAN
14 FTR
16 HHK
19 KOT
25 LMC
22 LOW
22 MOK
22 RAC
22 SHS
18 SHT
20 TAP
19 TAW
23 TWO
15 UNI
I want to sort it as:
25 LMC
23 TWO
22 FAN
22 LOW
22 MOK
22 RAC
22 SHS
20 TAP
19 KOT
19 TAW
18 SHT
16 HHK
15 UNI
14 FTR

So, basically, you're asking to sort the first field numerically in descending order, but if the numeric keys are the same, you want the second field to be ordered in natural, or ascending, order.
I tried a few things, but here's the way I managed to make it work:
sort -nk2 file.txt | sort -snrk1
Explanation:
The first command sorts the whole file using the second, alphanumeric field in natural order, while the second command sorts the output using the first numeric field, shows it in reverse order, and requests that it be a "stable" sort.
-n is for numeric sort, versus alphanumeric, in which 6 would come before 60.
-r is for reversed order, so from highest to lowest. If unspecified, it will assume natural, or ascending, order.
-k which key, or field, to use for sorting order.
-s for stable ordering. This option maintains the original record order of records that have an equal key.

There is no need for a pipe, or the additional subshell it spawns. Simply use of keydef for both fields 1 and 2 will do:
$ sort -k1nr,2 file
Example/Output
$ sort -k1nr,2 file
25 LMC
23 TWO
22 FAN
22 LOW
22 MOK
22 RAC
22 SHS
20 TAP
19 KOT
19 TAW
18 SHT
16 HHK
15 UNI
14 FTR

Related

Illogical number priority in file names in BASH [duplicate]

This question already has answers here:
How to loop over files in natural order in Bash?
(7 answers)
Closed 1 year ago.
It so happens that I wrote a script in BASH, part of which is supposed to take files from a specified directory in numerical order. Obviously, files in that directory are named as follows: 1, 2, 3, 4, 5, etc. The thing is, I discovered that while running this script with 10 files in the directory, something that appears quite illogical to me, occurs, as the script takes files in strange order: 10, 1, 2, 3, etc.
How do I make it run from minimum value of name of a file to maximum in decimals?
Also, I am using the following line of code to define loop and path:
for file in /dir/*
Don't know if it matters, but I'm using Fedora 33 as OS.
Directories are sorted by alphabetical order. So "10" is before "2".
If I list 20 files whose names correspond to the 20 first integers, I get:
1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9
I can call the function 'sort -n' so I'll sort them numerically rather than alphabetically. The following command:
for i in $(ls | sort -n) ; do echo $i ; done
produces the following output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
i.e. your command:
for file in /dir/*
should be rewritten:
for file in "dir/"$(ls /dir/* | sort -n)
If you have GNU sort then use the -V flag.
for file in /dir/* ; do echo "$file" ; done | sort -V
Or store the data in an array.
files=(/dir/*); printf '%s\n' "${files[#]}" | sort -V
As an aside, if you have the option and work once ahead of time is preferable to sorting every time, you could also format the names of your directories with leading zeroes. This is frequently a better design when possible.
I made both for some comparisons.
$: echo [0-9][0-9]/ # perfect list based on default string sort
00/ 01/ 02/ 03/ 04/ 05/ 06/ 07/ 08/ 09/ 10/ 11/ 12/ 13/ 14/ 15/ 16/ 17/ 18/ 19/ 20/
That also filters out any non-numeric names, and any non-directories.
$: for d in [0-9][0-9]/; do echo "${d%/}"; done
00
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
If I show both single- and double-digit versions (I made both)
$: shopt -s extglob
$: echo #(?|??)
0 00 01 02 03 04 05 06 07 08 09 1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9
Only the single-digit versions without leading zeroes get out of order.
The shell sorts the names by the locale order (not necessarily the byte value) of each individual character. Anything that starts with 1 will go before anything that starts with 2, and so on.
There's two main ways to tackle your problem:
sort -n (numeric sort) the file list, and iterate that.
Rename or recreate the target files (if you can), so all numbers are the same length (in bytes/characters). Left pad shorter numbers with 0 (eg. 01). Then they'll expand like you want.
Using sort (properly):
mapfile -td '' myfiles <(printf '%s\0' * | sort -zn)
for file in "${myfiles[#]}"; do
# what you were going to do
sort -z for zero/null terminated lines is common but not posix. It makes processing paths/data that contains new lines safe. Without -z:
mapfile -t myfiles <(printf '%s\n' * | sort -n)
# Rest is the same.
Rename the target files:
#!/bin/bash
cd /path/to/the/number/files || exit 1
# Gets length of the highest number. Or you can just hardcode it.
length=$(printf '%s\n' * | sort -n | tail -n 1)
length=${#length}
for i in *; do
mv -n "$i" "$(printf "%.${length}d" "$i")"
done
Examples for making new files with zero padded numbers for names:
touch {000..100} # Or
for i in {000..100}; do
> "$i"
done
If it's your script that made the target files, something like $(printf %.Nd [file]) can be used to left pad the names before you write to them. But you need to know the length in characters of the highest number first (N).

Sum of diagonal products

I am looking to get the sumproduct but only for specific diagonals in an array. My setup is like below and the yellow highlighting should give an idea of how the formula should calculate
As text:
Years Rates 0 1 2 3
25 0.16 25 24 23 22
26 0.11 26 25 24 23
27 0.12 27 26 25 24
28 0.13 28 27 26 25
29 0.17 29 28 27 26
30 0.16 30 29 28 27
Years Sum of products
25
26
27
28
29
30
Note, the table on the right dictates how many years to include, so if the table were extended to include 4 years then 0.17*4 would need to be included in the sum product for 25
What is the best way to do this? Ideally not a CSE formula/ VBA. The actual table is much bigger, so I might need to be conscious of speed too.
I intend to edit this with what I came up with but I hope to see some different ways of doing this so I hope it's okay that I hold off for now.
Simply:
=MMULT(G4:J4,B7:B10)
Regards
You could give this CSE a try, maybe it's not as bad (even though you don't want one)
=SUMPRODUCT(B7:B10,TRANSPOSE(G4:J4))
I think a 'CSE' formula will be best even though you'd prefer not to.
With the first formula in B11 and the setup as in your image, (with the 0, 1, 2, 3 in D1:G1, the word "Rates" in B1, and the array in D2:G7 etc)
{=SUM(IF($D$2:$G$7=A11, $D$1:$G$1*$B$2:$B$7, 0))}
and drag down
This is the best way I can find, without using a CSE formula
=SUMPRODUCT(--($C$2:$F$7=$A11),$B$2:$B$7*$C$1:$F$1)
The first array is n x m in size and the second array is the product
of a n x 1 and 1 x m array, which is converted to an n x m
array. This provides SUMPRODUCT with two identically sized arrays as required.

Sorting of data in descending order

Allow me to clarify my query:
I have a database with thousand of character strings, followed by some values (based on scoring matrix)
GKCHGYEGRGFQGRHYEGRSDGPNGQL 25
WGCGGYESRGFQGRHYEGGGDCPNGQG 56
GLCCGYEGRGFQCRHYEGGGDGPNDQL 43
GKGCGYEGRGFQGRHYEHGIDKDHFFR 24
PYGSGGNRARRSGCSWMLYEQVNYSGD 4
DFTEDLRCLQDVFAFNEIVSLNVLERL 3
REDYRRQSIYELSNYRCRQYLTDPSDY 18
There are equal values also present. I am trying to sort the data in descending order using:
sort -n -r file.txt
But the data is still disarranged. Also tried by adding -k argument.
Is it possible that i could get the following result:
GKCHGYEGRGFQGRHYEGRSDGPNGQL 56
WGCGGYESRGFQGRHYEGGGDCPNGQG 56
GLCCGYEGRGFQCRHYEGGGDGPNDQL 56
GKGCGYEGRGFQGRHYEHGIDKDHFFR 43
PYGSGGNRARRSGCSWMLYEQVNYSGD 25
DFTEDLRCLQDVFAFNEIVSLNVLERL 25
REDYRRQSIYELSNYRCRQYLTDPSDY 24
and so on.
I am new to Linux. Any help will be appreciated.
sort -k 2 -nr
This will number sort 2nd field in reverse order and print

Sort range Linux

everyone. I have some questions about sorting in bash. I am working with Ubuntu 14.04 .
The first question is: why if I have file some.txt with this content:
b 8
b 9
a 8
a 9
And when I type this :
sort -n -k 2 some.txt
the result will be:
a 8
b 8
a 9
b 9
which means that the file is sorted first to the second field and after that to the first field, but I thought that is will stay stable i.e.
b 8
a 8
...
...
Maybe if two rows are equal it is applied lexicographical sort or what ?
The second question is: why the following doesn`t working:
sort -n -k 1,2 try.txt
The file try.txt is like this:
8 2
8 11
8 0
8 5
9 2
9 0
The third question is not actally for sorting, but it appears when I try to do this:
sort blank.txt > blank.txt
After this the blank.txt file is empty. Why is that ?
Apparently GNU sort is not stable by default: add the -s option
Finally, as a last resort when all keys compare equal, sort compares entire lines as if no ordering options other than --reverse (-r) were specified. The --stable (-s) option disables this last-resort comparison so that lines in which all fields compare equal are left in their original relative order.
(https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html)
There's no way to answer your question if you don't show the text file
Redirections are handled by the shell before handing off control to the program. The > redirection will truncate the file if it exists. After that, you are giving an empty file to sort
for #2, you don't actually explain what's not working. Expanding your sample data, this happens
$ cat try.txt
8 2
8 11
9 2
9 0
11 11
11 2
$ cat try.txt
8 2
8 11
9 2
9 0
11 11
11 2
I assume you want to know why the 2nd column is not sorted numerically. Let's go back to the sed manual:
‘-n’
‘--numeric-sort’
‘--sort=numeric’
Sort numerically. The number begins each line and consists of ...
Looks like using -n only sorts the first column numerically. After some trial and error, I found this combination that sorts each column numerically:
$ sort -k1,1n -k2,2n try.txt
8 2
8 11
9 0
9 2
11 2
11 11

sort multiple column file

I have a file a.dat as following.
1 0.246102 21 1 0.0408359 0.00357267
2 0.234548 21 2 0.0401056 0.00264361
3 0.295771 21 3 0.0388905 0.00305116
4 0.190543 21 4 0.0371858 0.00427217
5 0.160047 21 5 0.0349674 0.00713894
I want to sort the file according to values in second column. i.e. output should look like
5 0.160047 21 5 0.0349674 0.00713894
4 0.190543 21 4 0.0371858 0.00427217
2 0.234548 21 2 0.0401056 0.00264361
1 0.246102 21 1 0.0408359 0.00357267
3 0.295771 21 3 0.0388905 0.00305116
How can do this with command line?. I read that sort command can be used for this purpose. But I could not figure out how to use sort command for this.
Use sort -k to indicate the column you want to use:
$ sort -k2 file
5 0.160047 21 5 0.0349674 0.00713894
4 0.190543 21 4 0.0371858 0.00427217
2 0.234548 21 2 0.0401056 0.00264361
1 0.246102 21 1 0.0408359 0.00357267
3 0.295771 21 3 0.0388905 0.00305116
This makes it in this case.
For future references, note (as indicated by 1_CR) that you can also indicate the range of columns to be used with sort -k2,2 (just use column 2) or sort -k2,5 (from 2 to 5), etc.
Note that you need to specify the start and end fields for sorting (2 and 2 in this case), and if you need numeric sorting, add n.
sort -k2,2n file.txt

Resources