sort multiple column file - linux

I have a file a.dat as following.
1 0.246102 21 1 0.0408359 0.00357267
2 0.234548 21 2 0.0401056 0.00264361
3 0.295771 21 3 0.0388905 0.00305116
4 0.190543 21 4 0.0371858 0.00427217
5 0.160047 21 5 0.0349674 0.00713894
I want to sort the file according to values in second column. i.e. output should look like
5 0.160047 21 5 0.0349674 0.00713894
4 0.190543 21 4 0.0371858 0.00427217
2 0.234548 21 2 0.0401056 0.00264361
1 0.246102 21 1 0.0408359 0.00357267
3 0.295771 21 3 0.0388905 0.00305116
How can do this with command line?. I read that sort command can be used for this purpose. But I could not figure out how to use sort command for this.

Use sort -k to indicate the column you want to use:
$ sort -k2 file
5 0.160047 21 5 0.0349674 0.00713894
4 0.190543 21 4 0.0371858 0.00427217
2 0.234548 21 2 0.0401056 0.00264361
1 0.246102 21 1 0.0408359 0.00357267
3 0.295771 21 3 0.0388905 0.00305116
This makes it in this case.
For future references, note (as indicated by 1_CR) that you can also indicate the range of columns to be used with sort -k2,2 (just use column 2) or sort -k2,5 (from 2 to 5), etc.

Note that you need to specify the start and end fields for sorting (2 and 2 in this case), and if you need numeric sorting, add n.
sort -k2,2n file.txt

Related

How to extract the number after specific word using awk?

I have several lines of text. I want to extract the number after specific word using awk.
I tried the following code but it does not work.
At first, create the test file by: vi test.text. There are 3 columns (the 3 fields are generated by some other pipeline commands using awk).
Index AllocTres CPUTotal
1 cpu=1,mem=256G 18
2 cpu=2,mem=1024M 16
3 4
4 cpu=12,gres/gpu=3 12
5 8
6 9
7 cpu=13,gres/gpu=4,gres/gpu:ret6000=2 20
8 mem=12G,gres/gpu=3,gres/gpu:1080ti=1 21
Please note there are several empty fields in this file.
what I want to achieve is to extract the number after the first gres/gpu= in each line (if no gres/gpu= occurs in this line, the default number is 0) using a pipeline like: cat test.text | awk '{some_commands}' to output 4 columns:
Index AllocTres CPUTotal GPUAllocated
1 cpu=1,mem=256G 18 0
2 cpu=2,mem=1024M 16 0
3 4 0
4 cpu=12,gres/gpu=3 12 3
5 8 0
6 9 0
7 cpu=13,gres/gpu=4,gres/gpu:ret6000=2 20 4
8 mem=12G,gres/gpu=3,gres/gpu:1080ti=1 21 3
Firstly: awk do not need cat, it could read files on its' own. Combining cat and awk is generally discouraged as useless use of cat.
For this task I would use GNU AWK following way, let file.txt content be
cpu=1,mem=256G
cpu=2,mem=1024M
cpu=12,gres/gpu=3
cpu=13,gres/gpu=4,gres/gpu:ret6000=2
mem=12G,gres/gpu=3,gres/gpu:1080ti=1
then
awk 'BEGIN{FS="gres/gpu="}{print $2+0}' file.txt
output
0
0
0
3
0
0
4
3
Explanation: I inform GNU AWK that field separator (FS) is gres/gpu= then for each line I do print 2nd field increased by zero. For lines without gres/gpu= $2 is empty string, when used in arithmetic context this is same as zero so zero plus zero gives zero. For lines with at least one gres/gpu= increasing by zero provokes GNU AWK to find longest prefix which is legal number, thus 3 (4th line) becomes 3, 4, (7th line) becomes 4, 3, (8th line) becomes 3.
(tested in GNU Awk 5.0.1)
With your shown samples in GNU awk you can try following code. Written and tested in GNU awk. Simple explanation would be using awk's match function where using regex gres\/gpu=([0-9]+)(escaping / here) and creating one and only capturing group to capture all digits coming after =. Once match is found printing current line followed by array's arr's 1st element +0(to print zero in case no match found for any line) here.
awk '
FNR==1{
print $0,"GPUAllocated"
next
}
{
match($0,/gres\/gpu=([0-9]+)/,arr)
print $0,arr[1]+0
}
' Input_file
Using sed
$ sed '1s/$/\tGPUAllocated/;s~.*gres/gpu=\([0-9]\).*~& \t\1~;1!{\~gres/gpu=[0-9]~!s/$/ \t0/}' input_file
Index AllocTres CPUTotal GPUAllocated
1 cpu=1,mem=256G 18 0
2 cpu=2,mem=1024M 16 0
3 4 0
4 cpu=12,gres/gpu=3 12 3
5 8 0
6 9 0
7 cpu=13,gres/gpu=4,gres/gpu:ret6000=2 20 4
8 mem=12G,gres/gpu=3,gres/gpu:1080ti=1 21 3
awk '
BEGIN{FS="\t"}
NR==1{
$(NF+1)="GPUAllocated"
}
NR>1{
$(NF+1)=FS 0
}
/gres\/gpu=/{
split($0, a, "=")
gp=a[3]; gsub(/[ ,].*/, "", gp)
$NF=FS gp
}1' test.text
Index AllocTres CPUTotal GPUAllocated
1 cpu=1,mem=256G 18 0
2 cpu=2,mem=1024M 16 0
3 4 0
4 cpu=12,gres/gpu=3 12 3
5 8 0
6 9 0
7 cpu=13,gres/gpu=4,gres/gpu:ret6000=2 20 4
8 mem=12G,gres/gpu=3,gres/gpu:1080ti=1 21 3

sort alphanumerically with priority for numbers in linux [duplicate]

This question already has answers here:
How to combine ascending and descending sorting?
(3 answers)
How to sort strings that contain a common prefix and suffix numerically from Bash?
(5 answers)
Closed 4 years ago.
I want to sort a file alphanumerically but with priority for the numbers in each file entry. Example: File is:
22 FAN
14 FTR
16 HHK
19 KOT
25 LMC
22 LOW
22 MOK
22 RAC
22 SHS
18 SHT
20 TAP
19 TAW
23 TWO
15 UNI
I want to sort it as:
25 LMC
23 TWO
22 FAN
22 LOW
22 MOK
22 RAC
22 SHS
20 TAP
19 KOT
19 TAW
18 SHT
16 HHK
15 UNI
14 FTR
So, basically, you're asking to sort the first field numerically in descending order, but if the numeric keys are the same, you want the second field to be ordered in natural, or ascending, order.
I tried a few things, but here's the way I managed to make it work:
sort -nk2 file.txt | sort -snrk1
Explanation:
The first command sorts the whole file using the second, alphanumeric field in natural order, while the second command sorts the output using the first numeric field, shows it in reverse order, and requests that it be a "stable" sort.
-n is for numeric sort, versus alphanumeric, in which 6 would come before 60.
-r is for reversed order, so from highest to lowest. If unspecified, it will assume natural, or ascending, order.
-k which key, or field, to use for sorting order.
-s for stable ordering. This option maintains the original record order of records that have an equal key.
There is no need for a pipe, or the additional subshell it spawns. Simply use of keydef for both fields 1 and 2 will do:
$ sort -k1nr,2 file
Example/Output
$ sort -k1nr,2 file
25 LMC
23 TWO
22 FAN
22 LOW
22 MOK
22 RAC
22 SHS
20 TAP
19 KOT
19 TAW
18 SHT
16 HHK
15 UNI
14 FTR

Shifting column titles to right

I have a file which I want to process it in bash or python.
The structure is with 4 columns but only with 3 column titles:
input.txt
1STCOLUMN 2NDCOLUMN THIRDCOLUMN
input1 12 33 45
input22 10 13 9
input4 2 23 11
input4534 3 1 1
I am trying to shift the title columns to right and add a title of "INPUTS" to the first column (input column).
Desired output: Adding the column title
Desired-output-step1.csv
INPUTS 1STCOLUMN 2NDCOLUMN THIRDCOLUMN
input1 12 33 45
input22 10 13 9
input4 2 23 11
input4534 3 1 1
I tried with sed:
sed -i '1iINPUTS, 1STCOLUMN, 2NDCOLUMN, THIRDCOLUMN' input.txt
But I do not prefer to type the names of the columns for this reason.
How do I just insert the new title to first column and the other column titles shift to right?
you can specify which line to be replaced using line numbers
$ sed '1s/^/INPUTS /' ip.txt
INPUTS 1STCOLUMN 2NDCOLUMN THIRDCOLUMN
input1 12 33 45
input22 10 13 9
input4 2 23 11
input4534 3 1 1
here, 1 indicates that you want to apply s command only for 1st line
s/^/INPUTS / insert something to start of line, you'll have to adjust the spacing as needed
instead of counting and testing the spaces, you can let column -t do the padding and formatting job:
sed '1s/^/INPUTS /' ip.txt|column -t
This will give you:
INPUTS 1STCOLUMN 2NDCOLUMN THIRDCOLUMN
input1 12 33 45
input22 10 13 9
input4 2 23 11
input4534 3 1 1

Count occurrence of numbers in linux

I have a .txt file with 25,000 lines. Each line there is a number from 1 to 20. I want to compute the total occurrence of each number in the file. I don't know should I use grep or awk and how to use it. And I'm worried about I got confused with 1 and 11, which both contain 1's. Thank you very much for helping!
I was trying but this would double count my numbers.
grep -o '1' degreeDistirbution.txt | wc -l
With grep you can match the beginning and end of a line with '^' and '$' respectively. For the whole thing I'll use an array, but to illustrate this point I'll just use one variable:
one="$(grep -c "^1$" ./$inputfile)"
then we put that together with the magic of bash loops and loop through all the numbers with a while like so:
i=1
while [[ $i -le 20 ]]
do
arr[i]="$(grep -c "^$i$" ./$inputfile)"
i=$[$i+1]
done
if you like you can of course use a for as well
An easier method is:
sort -n file | uniq -c
Which will count the occurrences of each number in the sorted file and display the results like:
$ sort -n dat/twenty.txt | uniq -c
3 1
3 2
3 3
4 4
4 5
4 6
4 7
4 8
4 9
4 10
4 11
3 12
2 13
2 14
4 15
4 16
4 17
2 18
2 19
2 20
Showing I have 3 ones, 3 twos, etc.. in the sample file.

Sort range Linux

everyone. I have some questions about sorting in bash. I am working with Ubuntu 14.04 .
The first question is: why if I have file some.txt with this content:
b 8
b 9
a 8
a 9
And when I type this :
sort -n -k 2 some.txt
the result will be:
a 8
b 8
a 9
b 9
which means that the file is sorted first to the second field and after that to the first field, but I thought that is will stay stable i.e.
b 8
a 8
...
...
Maybe if two rows are equal it is applied lexicographical sort or what ?
The second question is: why the following doesn`t working:
sort -n -k 1,2 try.txt
The file try.txt is like this:
8 2
8 11
8 0
8 5
9 2
9 0
The third question is not actally for sorting, but it appears when I try to do this:
sort blank.txt > blank.txt
After this the blank.txt file is empty. Why is that ?
Apparently GNU sort is not stable by default: add the -s option
Finally, as a last resort when all keys compare equal, sort compares entire lines as if no ordering options other than --reverse (-r) were specified. The --stable (-s) option disables this last-resort comparison so that lines in which all fields compare equal are left in their original relative order.
(https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html)
There's no way to answer your question if you don't show the text file
Redirections are handled by the shell before handing off control to the program. The > redirection will truncate the file if it exists. After that, you are giving an empty file to sort
for #2, you don't actually explain what's not working. Expanding your sample data, this happens
$ cat try.txt
8 2
8 11
9 2
9 0
11 11
11 2
$ cat try.txt
8 2
8 11
9 2
9 0
11 11
11 2
I assume you want to know why the 2nd column is not sorted numerically. Let's go back to the sed manual:
‘-n’
‘--numeric-sort’
‘--sort=numeric’
Sort numerically. The number begins each line and consists of ...
Looks like using -n only sorts the first column numerically. After some trial and error, I found this combination that sorts each column numerically:
$ sort -k1,1n -k2,2n try.txt
8 2
8 11
9 0
9 2
11 2
11 11

Resources