Linux sort -Help Wanted - linux

I'm stuck in a problem for few days. Here it is maybe u got bigger brains than me!
I got a bunch of CSV files and i want them concatenated into a single .csv file, numeric sorted. Ok, first encountered problem is with the ID (i want to sort unly by ID) name.
eg
sort -f *.csv > output.csv This would work if i had standard ids like id001, id002, id010, id100
but my ids are like id1, id2, id10, id100 and this make my sort job inaccurate.
Ok
sort -t, -V *.csv > output.csv - This works perfectly on my test machine (sort --version GNU coreutils 8.5.0) but my live machine from work got 5.3.0 sort version (and they didn't had implemented -V syntax on it) and i cannot update it!
I'm feel so noob and unlucky
If you have a better idea please bring it on.
my csv file looks like
cn41 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn42 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn43 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn44 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn45 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
This is actually copy / paste from a csv. So let's say, this is my first CSV. and the other one looks like
cn201 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn202 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn203 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn204 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn205 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
Looking forward reading you!
Regards

You can use the -kX.Y for column X starting on Y character, together with -n for numeric:
sort -t, -k2.3 -n *csv
Given your sample file, it produces:
$ sort -t, -k2.3 -n file
,id1,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id2,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id10,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id40,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id101,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id201,aaaaaaaaa,bbbbbbbbbb,ccccccccccc,ddddddd
Update
For your given input, I would do:
$ cat *csv | sort -k1.3 -n
cn41 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn42 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn43 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn44 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn45 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
cn201 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn202 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn203 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn204 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn205 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20

If your CSV format is fixed, you can use the shell equivalent of the decorate-sort-undecorate pattern:
cat *.csv | sed 's/^,id//' | sort -n | sed 's/^/,id/' >output.csv
The -n option is present even in ancient version of sort.
UPDATE: the updated input contains a number with a different prefix, and at a different position in the line. Here is a version that handles both kinds of input, as well as other inputs that have a number somewhere in the line, sorting by the first number:
cat *.csv | sed 's/^\([^0-9]*\)\([0-9][0-9]*\)/\2 \1\2/' \
| sort -n \
| sed 's/^[^ ]* //' > output.csv

You could try the -g option:
sort -t, -k 2.3 -g fileName
-t seperator
-k key/column
-g general numeric sort

Related

Linux bash sorting in reverse order based on a column isn't working as expected

I'm trying to sort a text file in the descending order based on the last column. It just doesn't seem to work.
cat 1.txt | sort -r -n -k 4,4
ACHG,89.46,0.08,34200
UUKJL,0.85,-15.00,200
NIMJKY,34.35,0.09,17700
TBBNHW,10.24,0.00,4600
JJkLEYE,73.67,0.48,25400
I've tried removing spaces just in case but, hasn't helped. Also, tried sorting by the other fields just to see but, ahve the same problem.
I just can't work out what is wrong with the command I've issued. Please could I request help with this one?
Your command is almost right but it is missing field separator option -t that should set comma as field separator.
This should work for you:
sort -t, -rnk 4,4 1.txt
ACHG,89.46,0.08,34200
JJkLEYE,73.67,0.48,25400
NIMJKY,34.35,0.09,17700
TBBNHW,10.24,0.00,4600
UUKJL,0.85,-15.00,200
Note that there is no need to use cat | sort here.

Creating a short shell script to print out a table using cut, sort, and head to arrange values

I need help on this homework. I thought I basically solved it, but two results does not match. I had "psychology" at line 5 where it's supposed to be line 1 and I have "finance" as the last row instead of "Political science". The output (autograder) is attached below for clarity.
Can anyone figure out what I'm doing wrong? Any help would be greatly appreciated.
Question is:
write a short shell script to first download the data set with wget from the included url
Next, print out "Major,Total" (the column names of interest) to the screen
Then using cut, sort, and head, print out the n most popular majors (largest Total) in descending order
You will want to use the -k and -t arguments of sort (on top of ones you should already know) to achieve this.
The value of n will be passed into the script as a command-line argument
I attached output differences between mine and the autograder below. My code goes like this:
number=$1
if [ ! -f recent-grads.csv ]; then
wget https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/recent-grads.csv
fi
echo Major,Total
cat recent-grads.csv | sed -i | sort -k4 -n -r -t, recent-grads.csv | cut -d ',' -f 3-4 | head -n ${number}

cat | sort csv file by name in bash

i have a bunch of csv files that i want to save them in one file ordered by name
i use
cat *.csv | sort -t\ -k2 -n *.csv > output.csv
works good for a naming like a001, a002, a010. a100
but in my files the names are fup a bit so they are like a1. a2. a10. a100
and the command i wrote will arrange my things like this:
cn201
cn202
cn202
cn203
cn204
cn99
cn98
cn97
cn96
..
cn9
can anyone please help me ?
Thanks
If I understand correctly, you want to use the -V (version-sort) flag instead of -n. This is only available on GNU sort, but that's probably the one you are using.
However, it depends how you want the prefixes to be sorted.
If you don't have the -V option, sort allows you to be more precise about what characters constitute a sort key.
sort -t\ -k2.3n *.csv > output.csv
The .3 tells sort that the key to sort on starts with the 3rd character of the second field, effectively skipping the cn prefix. You can put the n directly in the field specifier, which saves you two whole characters, but more importantly for more complex sorts, allows you to treat just that key as a number, rather than applying -n globally (which is only an issue if you specify multiple keys with several uses of -k).
The sort version on the live server is 5.97 from 2006
so few things did not work correctly.
However the code bellow is my solution
#!/bin/bash
echo "This script reads all CSVs into a single file (clusters.csv) in this directory"
for filers in *.csv
do
echo "" >> clusters.csv
echo "--------------------------------" >> clusters.csv
echo $filers >> largefile.txt
echo "--------------------------------" >> clusters.csv
cat $filers >> clusters.csv
done
or if you want to keep it simple inside one command
awk 'FNR > 1' *.csv > clusers.csv

Sorting csv file by 5th column using bash

The file looks like
5.1,3.5,1.4,0.2,Banana
4.9,3.0,1.4,0.6,Apple
4.8,2.8,1.3,1.2,Apple
and I need to have it be
4.9,3.0,1.4,0.2,Apple
4.8,2.8,1.3,1.2,Apple
5.1,3.5,1.4,0.2,Banana
I have been trying to use
sort -t, -k5 file.csv > sorted.csv
All it does is make it
5.1,3.5,1.4,0.2,Banana
4.8,2.8,1.3,1.2,Apple
4.9,3.0,1.4,0.6,Apple
How do I make it like this? It does not seem to be sorting it at all.
GNU sort is locale sensitive, which can cause weirdness. Try the following and see if it makes a difference:
LC_ALL=C sort -t, -k5 file.csv > sorted.csv
Is this what you need to have it be
# sort -t . -nrk2 sorted.csv
4.9,3.0,1.4,0.6,Apple
4.8,2.8,1.3,1.2,Apple
5.1,3.5,1.4,0.2,Banana

How to find the particular text stored in the file "data.txt" and it occurs only once

The line I seek is stored in the file data.txt and is the only line of text that occurs only once.
How do I go about finding that particular line using linux?
This is a little bit old, but I think you are looking for this...
cat data.txt | sort | uniq -u
This will show the unique values that only occur once in the file. I assume you are familiar with "over the wire" if you are asking?? If so, this is what you are looking for.
To provide some context (I need more rep to comment) this is a question that features in an online "wargame" called Bandit that involves using the command line to discover passwords on an online Linux server to advance up the levels.
For those who would like to see data.txt in full I've Pastebin'd it here however it looks like this:
NN4e37KW2tkIb3dC9ZHyOPdq1FqZwq9h
jpEYciZvDIs6MLPhYoOGWQHNIoQZzE5q
3rpovhi1CyT7RUTunW30goGek5Q5Fu66
JOaWd4uAPii4Jc19AP2McmBNRzBYDAkO
JOaWd4uAPii4Jc19AP2McmBNRzBYDAkO
9WV67QT4uZZK7JHwmOH0jnhurJMwoGZU
a2GjmWtTe3tTM0ARl7TQwraPGXgfkH4f
7yJ8imXc7NNiovDuAl1ZC6xb0O0mMBx1
UsvVyFSfZZWbi6wgC7dAFyFuR6jQQUhR
FcOJhZkHlnwqcD8QbvjRyn886rCrnWZ7
E3ugYDa6Wh2y8C8xQev7vOS8O3OgG1Hw
E3ugYDa6Wh2y8C8xQev7vOS8O3OgG1Hw
ME7nnzbId4W3dajsl6Xtviyl5uhmMenv
J5lN3Qe4s7ktiwvcCj9ZHWrAJcUWEhUq
aouHvjzagN8QT2BCMB6e9rlN4ffqZ0Qq
ZRF5dlSuwuVV9TLhHKvPvRDrQ2L5ODfD
9ZjR3NTHue4YR6n4DgG5e0qMQcJjTaiM
QT8Bw9ofH4x3MeRvYAVbYvV1e1zq3Xim
i6A6TL6nqvjCAPvOdXZWjlYgyvqxmB7k
tx7tQ6kgeJnC446CHbiJY7fyRwrwuhrs
One way to do it is to use:
sort data.txt | uniq -u
The sort command is like cat in that it displays the contents of the file however it sorts the file lexicographically by lines (it reorders them alphabetically so that matching ones are together).
The | is a pipe that redirects the output from one command into another.
The uniq command reports or omits repeated lines and by passing it the -u argument we tell it to report only unique lines.
Used together like this, the command will sort data.txt lexicographically by each line, find the unique line and print it back in the terminal for you.
sort -u data.txt | while read line; do if [ $(grep -c $line data.txt) == 1 ] ;then echo $line; fi; done
was mine solution, until I saw here easy one:
sort data.txt | uniq -u
Add more information to you post.
How data.txt look like?
Like this:
11111111
11111111
pass1111
11111111
Or like this
afawfdgd
password
somethin
gelse...
And, do you know the password is in file or you search for not repeat string.
If you know password, use something like this
cat data.txt | grep 'password'
If you don`t know the password and this password is only unique line in file you must create a script.
For example in Python
file = open("data.txt","r")
f = file.read()
for line in f:
if 'pass' in line:
print pass
Of course replace pass with something else.
For example some slice from line.
And one with only one tool in use, awk:
awk '{a[$1]++}END{for(i in a){if(a[i] == 1){print i} }}' data.txt
sort data.txt | uniq -c | grep 1\ ?*
and it will print the only text that occurs only one time
do not forget to put space after the backslash
sort data.txt | uniq -c | grep 1
you will find only one that accures one time

Resources