Shell | Sort Date and Month in Ascending order - linux

I wanted to display/sort the file records in Ascending order of Date and Month or if there are any equal data values they should list in the very next column in ascending order.
Date & Month to sort: (current scenario)
ver.....03.02../ver>
ver.....19.01../ver>
ver.....02.02..ver>
File content:
ver>0.1.1-ABC-XYA-BR-03.02-v1.0-1-4d4f3dd/ver>
ver>0.1.1-XYZ-LOK-BR-19.01-v1.0-5-8a8d7dd/ver>
ver>0.1.1-DXD-UIJ-BR-02.02-v1.0-4-9o2k4wk/ver>
How would I can achieve below following results?
ver>0.1.1-XYZ-LOK-BR-19.01-v1.0-5-8a8d7dd/ver>
ver>0.1.1-DXD-UIJ-BR-02.02-v1.0-4-9o2k4wk/ver>
ver>0.1.1-ABC-XYA-BR-03.02-v1.0-1-4d4f3dd/ver>
I tried using sort: (not working)
sort -n sortfile.txt
ver>0.1.1-DXD-UIJ-BR-02.02-v1.0-4-9o2k4wk/ver>
ver>0.1.1-ABC-XYA-BR-03.02-v1.0-1-4d4f3dd/ver>
ver>0.1.1-XYZ-LOK-BR-19.01-v1.0-5-8a8d7dd/ver>

You can use sort, but you will need to specify the field-seperator -t '-' so that fields are separated by '-' and then specify the keydef to sort on the 5th field beginning with the 4th character and then again with the 1st character and finally a version sort on field 6 if all else is equal. That would be:
sort -t '-' -k5.4n -k5.1n -k6V contents
Providing full start and stop characters within each keydef can be done as:
sort -t '-' -k5.4n,5.5 -k5.1n,5.2 -k6V contents
(though for this data the output isn't changed)
Example Use/Output
$ sort -t '-' -k5.4n -k5.1n -k6V contents
ver>0.1.1-XYZ-LOK-BR-19.01-v1.0-5-8a8d7dd/ver>
ver>0.1.1-DXD-UIJ-BR-02.02-v1.0-4-9o2k4wk/ver>
ver>0.1.1-ABC-XYA-BR-03.02-v1.0-1-4d4f3dd/ver>

Related

Using awk and sort last column in descending order in Linux

I have a file contains both name and numbers like :
data.csv
2016,Bmw,M2,2 Total score:24
1998,Subaru,Legacy,23 Total score:62
2012,Volkswagen,Golf,59 Total score:28
2001,Dodge,Viper,42 Total score:8
2014,Honda,Accord,83 Total score:112
2015,Chevy,Camaro,0 Total score:0
2008,Honda,Accord,88 Total score:48
Total score is last column I did :
awk -F"," 'NR>1{{for(i=4;i<=6;++i)printf $i""FS }
{sum=0; for(g=8;g<=NF;g++)
sum+=$g
print $i,"Total score:"sum ; print ""}}' data.csv
when i try
awk -F"," 'NR>1{{for(i=4;i<=6;++i)printf $i""FS }
{sum=0; for(g=8;g<=NF;g++)
sum+=$g
print $i,"Total score:"sum ; print "" | "sort -k1,2n"}}' data.csv
It gave me error, I only want to sort total score column, is there anything I did it wrong? Any helps are appreciated
First, assuming there are really not blank lined in between each line of data in data.csv, all you need is sort, you don't need awk at all. For example, since there is only ':' before the total score you want to sort descending by, you can use:
sort -t: -k2,2rn data.csv
Where -t: tells sort to use ':' as the field separator and then the KEYDEF -k2,2rn tell sort to use the 2nd field (what's after the ':' to sort by) and the rn says use a reverse numeric sort on that field.
Example Use/Output
With your data (without blank lines) in data.csv, you would have:
$ sort -t: -k2,2rn data.csv
2014,Honda,Accord,83 Total score:112
1998,Subaru,Legacy,23 Total score:62
2008,Honda,Accord,88 Total score:48
2012,Volkswagen,Golf,59 Total score:28
2016,Bmw,M2,2 Total score:24
2001,Dodge,Viper,42 Total score:8
2015,Chevy,Camaro,0 Total score:0
Which is your sort by Total score in descending order. If you want ascending order, just remove the r from -k2,2rn.
If you do have blank lines, you can remove them before the sort with sed -i '/^$/d' data.csv.
Sorting by number Before "Total score"
If you want to sort by the number that begins the XX Total score: yy field (e.g. XX), you can use sort with the field separator being a ',' and then your KEYDEF would be -k4.1,4.3rn which just says sort using the 4th field character 1 through character 3 by the same reverse numeric, e.g.
sort -t, -k4.1,4.3rn data.csv
Example Use/Output
In this case, sorting by the number before Total score in descending order would result in:
$ sort -t, -k4.1,4.3rn data.csv
2008,Honda,Accord,88 Total score:48
2014,Honda,Accord,83 Total score:112
2012,Volkswagen,Golf,59 Total score:28
2001,Dodge,Viper,42 Total score:8
1998,Subaru,Legacy,23 Total score:62
2016,Bmw,M2,2 Total score:24
2015,Chevy,Camaro,0 Total score:0
After posting the original solution I noticed it was ambiguous as which of the numbers on the 4th field you intended to sort on. In either case, here are both solutions. Let me know if you have further questions.

Print whole line with highest value from one column

I have a little issue right now.
I have a file with 4 columns
test0000002,10030010330,c_,218
test0000002,10030010330,d_,202
test0000002,10030010330,b_,193
test0000002,10030010020,c_,178
test0000002,10030010020,b_,170
test0000002,10030010330,a_,166
test0000002,10030010020,a_,151
test0000002,10030010020,d_,150
test0000002,10030070050,c_,119
test0000002,10030070050,b_,99
test0000002,10030070050,d_,79
test0000002,10030070050,a_,56
test0000002,10030010390,c_,55
test0000002,10030010390,b_,44
test0000002,10030010380,d_,41
test0000002,10030010380,a_,37
test0000002,10030010390,d_,35
test0000002,10030010380,c_,33
test0000002,10030010390,a_,31
test0000002,10030010320,c_,30
test0000002,10030010320,b_,27
test0000002,10030010380,b_,26
test0000002,10030010320,a_,23
test0000002,10030010320,d_,22
test0000002,10030010010,a_,6
and I want the highest value from 4th column sorted from 2nd column.
test0000002,10030010330,c_,218
test0000002,10030010020,c_,178
test0000002,10030010330,a_,166
test0000002,10030010020,a_,151
test0000002,10030070050,c_,119
test0000002,10030010390,c_,55
test0000002,10030010380,d_,41
test0000002,10030010320,c_,30
test0000002,10030010390,a_,31
test0000002,10030010380,c_,33
test0000002,10030010390,d_,35
test0000002,10030010320,a_,23
test0000002,10030010380,b_,26
test0000002,10030010010,a_,6
It appears that your file is already sorted in descending order on the 4th column, so you just need to print lines where the 2nd column appears for the first time:
awk -F, '!seen[$2]++' file
test0000002,10030010330,c_,218
test0000002,10030010020,c_,178
test0000002,10030070050,c_,119
test0000002,10030010390,c_,55
test0000002,10030010380,d_,41
test0000002,10030010320,c_,30
test0000002,10030010010,a_,6
If your input file is not sorted on column 4, then
sort -t, -k4nr file | awk -F, '!seen[$2]++'
You can use two sorts:
sort -u -t, -k2,2 file | sort -t, -rnk4
The first one removes duplicates in the second column, the second one sorts the first one on the 4th column.

Linux filtering a file by two columns and printing the output

I have a table that has 9 columns as shown below.
How would I first sort by the strand column so only those with a "+" are selected, and then of those I select the ones that have 3 exons (In the exon count column).
I have been trying to use grep for this as I understand I can pick out a word from a column, but I only get the particular column or just the total number.
using awk
awk -F "," ' $4=="+" && $9=="3" ' file.csv
If it's not CSV then remove -F "," from this command

Uniqing a delimited file based on a subset of fields

I have data such as below:
1493992429103289,207.55,207.5
1493992429103559,207.55,207.5
1493992429104353,207.55,207.5
1493992429104491,207.6,207.55
1493992429110551,207.55,207.5
Due to the nature of the last two columns, their values change throughout the day and their values are repeated regularly. By grouping the way outlined in my desired output (below), I am able to view each time there was a change in their values (with the enoch time in the first column). Is there a way to achieve the desired output shown below:
1493992429103289,207.55,207.5
1493992429104491,207.6,207.55
1493992429110551,207.55,207.5
So I consolidate the data by the second two columns. However, the consolidation is not completely unique (as can be seen by 207.55, 207.5 being repeated)
I have tried:
uniq -f 1
However the output gives only the first line and does not go on through the list
The awk solution below does not allow the occurrence which happened previously to be outputted again and so gives the output (below the awk code):
awk '!x[$2 $3]++'
1493992429103289,207.55,207.5
1493992429104491,207.6,207.55
I do not wish to sort the data by the second two columns. However, since the first is epoch time, it may be sorted by the first column.
You can't set delimiters with uniq, it has to be white space. With the help of tr you can
tr ',' ' ' <file | uniq -f1 | tr ' ' ','
1493992429103289,207.55,207.5
1493992429104491,207.6,207.55
1493992429110551,207.55,207.5
You can use an Awk statement as below,
awk 'BEGIN{FS=OFS=","} s != $2 && t != $3 {print} {s=$2;t=$3}' file
which produces the output as you need.
1493992429103289,207.55,207.5
1493992429104491,207.6,207.55
1493992429110551,207.55,207.5
The idea is to store the second and third column values in variables s and t respectively and print the line contents only if the current line is unique.
I found an answer which is not as elegant as Inian but satisfies my purpose.
Since my first column is always enoch time in microseconds and does not increase or decrease in characters, I can use the following uniq command:
uniq -s 17
You can try to manually (with a loop) compare current line with previous line.
previous_line=""
# start at first line
i=1
# suppress first column, that don't need to compare
sed 's#^[0-9][0-9]*,##' ./data_file > ./transform_data_file
# for all line within file without first column
for current_line in $(cat ./transform_data_file)
do
# if previous record line are same than current line
if [ "x$prev_line" == "x$current_line" ]
then
# record line number to supress after
echo $i >> ./line_to_be_suppress
fi
# record current line as previous line
prev_line=$current_line
# increment current number line
i=$(( i + 1 ))
done
# suppress lines
for line_to_suppress in $(tac ./line_to_be_suppress) ; do sed -i $line_to_suppress'd' ./data_file ; done
rm line_to_be_suppress
rm transform_data_file
Since your first field seems to have a fixed length of 18 characters (including the , delimiter), you could use the -s option of uniq, which would be more optimal for larger files:
uniq -s 18 file
Gives this output:
1493992429103289,207.55,207.5
1493992429104491,207.6,207.55
1493992429110551,207.55,207.5
From man uniq:
-f num
Ignore the first num fields in each input line when doing comparisons.
A field is a string of non-blank characters separated from adjacent fields by blanks.
Field numbers are one based, i.e., the first field is field one.
-s chars
Ignore the first chars characters in each input line when doing comparisons.
If specified in conjunction with the -f option, the first chars characters after
the first num fields will be ignored. Character numbers are one based,
i.e., the first character is character one.

Alphanumeric sorting of a string with variable width size

I am stuck in a small sorting step. I have a huge file with >300K entries and the file has to be sorted on a specific column containing alphanumeric identifiers as
Rpl12-8
Lrsam1-1
Rpl12-9
Lrsam1-2
Rpl12-10
Lrsam1-5
Rpl12-11
Lrsam1-101
Lrsam2-1
Act-1
Act-100
Act-101
Act-11
The problem is the variable width size, so I am unable to specify the second key identifier (sort -k 1.8n).The first sort is on first alphabet, then on number next to it and then the third number after "-". Can I specifically enable sorting after "-" using delimiter field so then I don't care about width of string.
Desired output would be :
Act-1
Act-11
Act-100
Act-101
Lrsam1-1
Lrsam1-2
Lrsam1-5
Lrsam1-101
Lrsam2-1
Rpl12-8
Rpl12-9
Rpl12-10
Rpl12-11
With the above data in input.txt:
sort -t- -k1,1 -k2n input.txt
You can change the field delimiter to - with -t, then sort on the first field only (as a string) with -k1,1, and finally the 2nd field (as a number) with -k2n.

Resources