Linux bash sorting in reverse order based on a column isn't working as expected - linux

I'm trying to sort a text file in the descending order based on the last column. It just doesn't seem to work.
cat 1.txt | sort -r -n -k 4,4
ACHG,89.46,0.08,34200
UUKJL,0.85,-15.00,200
NIMJKY,34.35,0.09,17700
TBBNHW,10.24,0.00,4600
JJkLEYE,73.67,0.48,25400
I've tried removing spaces just in case but, hasn't helped. Also, tried sorting by the other fields just to see but, ahve the same problem.
I just can't work out what is wrong with the command I've issued. Please could I request help with this one?

Your command is almost right but it is missing field separator option -t that should set comma as field separator.
This should work for you:
sort -t, -rnk 4,4 1.txt
ACHG,89.46,0.08,34200
JJkLEYE,73.67,0.48,25400
NIMJKY,34.35,0.09,17700
TBBNHW,10.24,0.00,4600
UUKJL,0.85,-15.00,200
Note that there is no need to use cat | sort here.

Related

Creating a short shell script to print out a table using cut, sort, and head to arrange values

I need help on this homework. I thought I basically solved it, but two results does not match. I had "psychology" at line 5 where it's supposed to be line 1 and I have "finance" as the last row instead of "Political science". The output (autograder) is attached below for clarity.
Can anyone figure out what I'm doing wrong? Any help would be greatly appreciated.
Question is:
write a short shell script to first download the data set with wget from the included url
Next, print out "Major,Total" (the column names of interest) to the screen
Then using cut, sort, and head, print out the n most popular majors (largest Total) in descending order
You will want to use the -k and -t arguments of sort (on top of ones you should already know) to achieve this.
The value of n will be passed into the script as a command-line argument
I attached output differences between mine and the autograder below. My code goes like this:
number=$1
if [ ! -f recent-grads.csv ]; then
wget https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/recent-grads.csv
fi
echo Major,Total
cat recent-grads.csv | sed -i | sort -k4 -n -r -t, recent-grads.csv | cut -d ',' -f 3-4 | head -n ${number}

Linux sort -Help Wanted

I'm stuck in a problem for few days. Here it is maybe u got bigger brains than me!
I got a bunch of CSV files and i want them concatenated into a single .csv file, numeric sorted. Ok, first encountered problem is with the ID (i want to sort unly by ID) name.
eg
sort -f *.csv > output.csv This would work if i had standard ids like id001, id002, id010, id100
but my ids are like id1, id2, id10, id100 and this make my sort job inaccurate.
Ok
sort -t, -V *.csv > output.csv - This works perfectly on my test machine (sort --version GNU coreutils 8.5.0) but my live machine from work got 5.3.0 sort version (and they didn't had implemented -V syntax on it) and i cannot update it!
I'm feel so noob and unlucky
If you have a better idea please bring it on.
my csv file looks like
cn41 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn42 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn43 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn44 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn45 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
This is actually copy / paste from a csv. So let's say, this is my first CSV. and the other one looks like
cn201 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn202 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn203 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn204 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn205 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
Looking forward reading you!
Regards
You can use the -kX.Y for column X starting on Y character, together with -n for numeric:
sort -t, -k2.3 -n *csv
Given your sample file, it produces:
$ sort -t, -k2.3 -n file
,id1,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id2,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id10,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id40,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id101,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id201,aaaaaaaaa,bbbbbbbbbb,ccccccccccc,ddddddd
Update
For your given input, I would do:
$ cat *csv | sort -k1.3 -n
cn41 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn42 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn43 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn44 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn45 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
cn201 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn202 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn203 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn204 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn205 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
If your CSV format is fixed, you can use the shell equivalent of the decorate-sort-undecorate pattern:
cat *.csv | sed 's/^,id//' | sort -n | sed 's/^/,id/' >output.csv
The -n option is present even in ancient version of sort.
UPDATE: the updated input contains a number with a different prefix, and at a different position in the line. Here is a version that handles both kinds of input, as well as other inputs that have a number somewhere in the line, sorting by the first number:
cat *.csv | sed 's/^\([^0-9]*\)\([0-9][0-9]*\)/\2 \1\2/' \
| sort -n \
| sed 's/^[^ ]* //' > output.csv
You could try the -g option:
sort -t, -k 2.3 -g fileName
-t seperator
-k key/column
-g general numeric sort

rsync verbose with final stats but no file list

I see that when I use rsync with the -v option it prints the changed files list and some useful infos at the end, like the total transfer size.
Is it somewhat possible to cut out the first (long) part and just print the stats? I am using it in a script, and the log shouldn't be so long. Only the stats are useful.
Thank you.
As I was looking for an answer and came across this question:
rsync also supports the --stats option.
Best solution for now i think :
rsync --info=progress0,name0,flist0,stats2 ...
progress0 hides progress
progress2 display progress
name0 hides file names
stats2 displays stats at the end of transfer
This solution is more a "hack" than the right way to do it because the output is generated but only filtered afterwards. You can use the option --out-format.
rsync ... --out-format="" ... | grep -v -E "^sending|^created" | tr -s "\n"
The grep filter should probably be updated with unwanted lines you see in the output. The tr is here to filter the long sequence of carriage returns.
grep -E for extended regexes
grep -v to invert the match. "Selected lines are those not matching any of the specified patterns."
tr -s to squeeze the repeated carriage returns into a single one

Sorting csv file by 5th column using bash

The file looks like
5.1,3.5,1.4,0.2,Banana
4.9,3.0,1.4,0.6,Apple
4.8,2.8,1.3,1.2,Apple
and I need to have it be
4.9,3.0,1.4,0.2,Apple
4.8,2.8,1.3,1.2,Apple
5.1,3.5,1.4,0.2,Banana
I have been trying to use
sort -t, -k5 file.csv > sorted.csv
All it does is make it
5.1,3.5,1.4,0.2,Banana
4.8,2.8,1.3,1.2,Apple
4.9,3.0,1.4,0.6,Apple
How do I make it like this? It does not seem to be sorting it at all.
GNU sort is locale sensitive, which can cause weirdness. Try the following and see if it makes a difference:
LC_ALL=C sort -t, -k5 file.csv > sorted.csv
Is this what you need to have it be
# sort -t . -nrk2 sorted.csv
4.9,3.0,1.4,0.6,Apple
4.8,2.8,1.3,1.2,Apple
5.1,3.5,1.4,0.2,Banana

Bash/Linux Sort by 3rd column using custom field seperator

I can't seem to sort the following data as I would like;
find output/ -type f -name *.raw | sort
output/rtp.0.0.raw
output/rtp.0.10.raw
output/rtp.0.11.raw
output/rtp.0.12.raw
output/rtp.0.13.raw
output/rtp.0.14.raw
output/rtp.0.15.raw
output/rtp.0.16.raw
output/rtp.0.17.raw
output/rtp.0.18.raw
output/rtp.0.19.raw
output/rtp.0.1.raw
output/rtp.0.20.raw
output/rtp.0.2.raw
output/rtp.0.3.raw
output/rtp.0.4.raw
output/rtp.0.5.raw
output/rtp.0.6.raw
output/rtp.0.7.raw
output/rtp.0.8.raw
output/rtp.0.9.raw
In the above example I haven't passed any arguments to the sort command. No matter what options I used I can't get closer to my desired results. I would like the following output;
find output/ -type f -name *.raw | sort
output/rtp.0.0.raw
output/rtp.0.1.raw
output/rtp.0.2.raw
output/rtp.0.3.raw
output/rtp.0.4.raw
output/rtp.0.5.raw
output/rtp.0.6.raw
output/rtp.0.7.raw
output/rtp.0.8.raw
output/rtp.0.9.raw
output/rtp.0.10.raw
output/rtp.0.11.raw
output/rtp.0.12.raw
output/rtp.0.13.raw
output/rtp.0.14.raw
output/rtp.0.15.raw
output/rtp.0.16.raw
output/rtp.0.17.raw
output/rtp.0.18.raw
output/rtp.0.19.raw
output/rtp.0.20.raw
I have tried with -t . option to set a field separator to the full stop. Also I have experimented with the -k option to specify the field, and -g, -h, -n, but none of the options are helping. I can't see anything else in the man pages that would do as I require, unless I haven't understood the man pages correctly and overlooked my answer.
Can I produce the results I require with sort, and if so, how?
Additionally, it's vary rare but sometimes the 2nd column which shows as '0' all the way down may increment. Can that be factored into the sort?
This makes it:
$ sort -t'.' -n -k3 a
output/rtp.0.0.raw
output/rtp.0.1.raw
output/rtp.0.2.raw
output/rtp.0.3.raw
output/rtp.0.4.raw
output/rtp.0.5.raw
output/rtp.0.6.raw
output/rtp.0.7.raw
output/rtp.0.8.raw
output/rtp.0.9.raw
output/rtp.0.10.raw
output/rtp.0.11.raw
output/rtp.0.12.raw
output/rtp.0.13.raw
output/rtp.0.14.raw
output/rtp.0.15.raw
output/rtp.0.16.raw
output/rtp.0.17.raw
output/rtp.0.18.raw
output/rtp.0.19.raw
output/rtp.0.20.raw
As you see we need different options:
-t'.' to set the dot . as the field separator.
-n to make it numeric sort.
-k3 to check the 3rd column.
Update
This also makes it:
$ sort -t'.' -V -k2 a
output/rtp.0.0.raw
output/rtp.0.1.raw
output/rtp.0.2.raw
output/rtp.0.3.raw
output/rtp.0.4.raw
output/rtp.0.5.raw
output/rtp.0.6.raw
output/rtp.0.7.raw
output/rtp.0.8.raw
output/rtp.0.9.raw
output/rtp.0.10.raw
output/rtp.0.11.raw
output/rtp.0.12.raw
output/rtp.0.13.raw
output/rtp.0.14.raw
output/rtp.0.15.raw
output/rtp.0.16.raw
output/rtp.0.17.raw
output/rtp.0.18.raw
output/rtp.0.19.raw
output/rtp.0.20.raw
As you see we need different options:
-t'.' to set the dot . as the field separator.
-V to make it sort based on version.
-k2 to check the 2nd column.
Fedorqui's solution is good (+1), but not all versions of sort support -V. For those versions that do not, you need to do a little more work than fedorqui's original solution. This should suffice:
sort -t. -k2,2 -k3,3 -n
You get a slightly different sort (eg, '05' sorts before '1' instead of after) if you use:
sort -t. -k2g
(Note that -g is also non-standard, and not available in all versions of sort).
sort -t'.' -n -k3
to sort the 3th column from smallest to biggest
And if you want to sort from biggest to smallest,
you can use '-r' option :
sort -t'.' -n -r -k3

Resources