Issue with unix sort - linux

This is more of a doubt than a question.
So I have an input file like this:
$ cat test
class||sw sw-explr bot|results|id,23,0a522b36-556f-4116-b485-adcf132b6cad,20130325,/html/body/div/div[3]/div[2]/div[2]/div[3]/div/div/div/div/div/div[2]/div/div/ul/li[4]/div/img
class||sw sw-explr bot|results|id,40,30cefa2c-6ebf-485e-b49c-3a612fe3fd73,20130323,/html/body/div/div[3]/div[2]/div[3]/div[3]/div/div/div/div/div[3]/div/div/ul/li[8]/div/img
class||sw sw-explr bot|results|id,3,72805487-72c3-4173-947f-e5abed6ea1e4,20130324,/html/body/div/div[3]/div[2]/div[2]/div[2]/div/div/div/div/div/div[3]/div/div/div[2]/ul/li[20]/div/img
Kind of defining the element in an html page.
The comma separated 5 columns can be considered.
I want to sort this file with respect to the second column, i.e. columns having 23,40,3.
I am not sure why unix sort isn't working.
These are the queries I tried, surprisingly none gave me desired result.
cat test | sort -nt',' -k2
cat test | sort -n -t, -k2
cat test | sort -n -t$',' -k2
cat test | sort -t"," -k2
cat test | sort -n -k2
Is there something about sort that I don't know?
This didn't cause me a problem as I separated the columns, sorted, then joined again. But why did not sort work??
NB:- If I remove $3 of this file and then sort, it works fine!

this line should work for you:
sort -t, -n -k2,2 test
you don't need cat test|sort, just sort file
the default END POS of -k is the end of line. so if you sort -k2 it means sort from the 2nd field till the end of line. In fact you need sort by exact the 2nd field. And this also explains why your sort worked if you removed 3rd col.
if test with your example:
kent$ sort -t, -n -k2,2 file
class||sw sw-explr bot|results|id,3,72805487-72c3-4173-947f-e5abed6ea1e4,20130324,/html/body/div/div[3]/div[2]/div[2]/div[2]/div/div/div/div/div/div[3]/div/div/div[2]/ul/li[20]/div/img
class||sw sw-explr bot|results|id,23,0a522b36-556f-4116-b485-adcf132b6cad,20130325,/html/body/div/div[3]/div[2]/div[2]/div[3]/div/div/div/div/div/div[2]/div/div/ul/li[4]/div/img
class||sw sw-explr bot|results|id,40,30cefa2c-6ebf-485e-b49c-3a612fe3fd73,20130323,/html/body/div/div[3]/div[2]/div[3]/div[3]/div/div/div/div/div[3]/div/div/ul/li[8]/div/img

Here comes a working solution:
cat test.file | sort -t, -k2n,2
Explanation:
-t, # Set field separator to ','
-k2n,2 # sort by the second column, numerical

Related

Sorting csv file based on two columns in unix

I'm a beginner in unix shell scripting. I'm trying to sort a csv file based on two columns.
My file looks like below:
sh-4.4$ cat test.csv
603,02,0123456,1111,201806131115
603,20,0123456,1111,201806131115
603,02,9876542,2222,201806131215
603,20,9876542,2222,201806131215
603,02,0123456,1111,201806131117
603,20,0123456,1111,201806131117
I want to group by the 3rd column and the 2nd column should also be ordered as shown below:
603,20,0123456,1111,201806131115
603,02,0123456,1111,201806131115
603,20,0123456,1111,201806131117
603,02,0123456,1111,201806131117
603,20,9876542,2222,201806131215
603,02,9876542,2222,201806131215
I tried doing sort -t',' -k3 -k2 test.csv. This does groups the column 3, but it does not sort the column 2. Its output looks like below.
603,02,0123456,1111,201806131115
603,20,0123456,1111,201806131115
603,02,0123456,1111,201806131117
603,20,0123456,1111,201806131117
603,02,9876542,2222,201806131215
603,20,9876542,2222,201806131215
I also tried sort -t',' -k3 -rk2 test.csv. This however sorts the column 2 as I desired but the column 3 is not sorted as I expected. Its output looks like below.
603,20,9876542,2222,201806131215
603,02,9876542,2222,201806131215
603,20,0123456,1111,201806131117
603,02,0123456,1111,201806131117
603,20,0123456,1111,201806131115
603,02,0123456,1111,201806131115
Any help on this is much appreciated. Suggestions to sort using awk is also welcome.
restrict the sorting fields
$ sort -t, -k3,3 -k2,2 file
should do.
Note however that the output you want doesn't match the spec you describe. You'll get
603,02,0123456,1111,201806131115
603,02,0123456,1111,201806131117
603,20,0123456,1111,201806131115
603,20,0123456,1111,201806131117
603,02,9876542,2222,201806131215
603,20,9876542,2222,201806131215
grouped by third field only and sorted by second field.
Perhaps this is what you wanted?
$ sort -t, -k3 -k2,2r file
603,20,0123456,1111,201806131115
603,02,0123456,1111,201806131115
603,20,0123456,1111,201806131117
603,02,0123456,1111,201806131117
603,20,9876542,2222,201806131215
603,02,9876542,2222,201806131215
note that -k3 means starting from 3rd field to the end, which seems what you want based on the order of the last fields. Also, you want to reorder the rows based on 2nd field in reverse order.
NB. If your numerical fields are not zero padded you may want to add -n option indicate numerical ordering instead of lexical ordering. Here it doesn't make a difference.
Sort will work sorting data on csv & txt file , it will print the output on console
-t says columns are delimited by '|' , -k1 -k2 says that-- it will sort te data by column 1 & then by 2
$ sort -t '|' -k1 -k2 <INPUT_FILE>
For storing the result in output file use following command
$ sort -t '|' -k1 -k2 <INPUT_FILE> -o <OUTPUTFILE>
If you wann do it with ignoring header line then use following command
(head -n1 INPUT_FILE && sort <(tail -n+2 INPUT_FILE)) > OUTPUT_FILE
head -n1 INPUT_FILE which will print only the first line of your file i.e. header
&
This special tail syntax gets your file from second line up to EOF.

How to filter multiple files and eliminate duplicate entries to select a single entry while using linux shell

I have a folder that contains several files. These files consist of identical columns.
Let us say file1 and file2 have contents as follows.(Here it can be more than two files)
$cat file1.txt
9999999999|1200
8888888888|1400
7777777777|1255
6666666666|1788
7777777777|1289
9999999999|1300
$cat file2.txt
9999999999|2500
8888888888|2450
6666666666|2788
9999999999|3000
2222222222|3001
In my file 1st column is mobile number and 2nd is count. Same mobile can be there in multiple files. Now I want to get the records into a file with unique mobile numbers which has the highest count.
The output should be as follows:
$cat output.txt
7777777777|1289
8888888888|2450
6666666666|2788
9999999999|3000
2222222222|3001
Any help would be appreciated.
That's probably not very efficient but it does the job:
put this into phones.sh and run sh phones.sh
#!/bin/bash
files="
file1.txt
file2.txt
"
phones=$(cat $files | cut -d'|' -f1 | sort -u)
for phone in $phones; do grep -h $phone $files | sort -t'|' -k 2 -nr | head -n1; done | sort -t'|' -k 2
What it does is basically, extract all the phone numbers in the files, iterate over them and grep them in all files, select the one with the highest count. Then I also sorted the final result by count, which is what your expected result suggests. sort -t'|' -k 2 -nr means sort the second column given the delimiter |, by decreasing numerical order. head -n1 selects the first line. You can add other files into the files variable.
Another way of doing this is to use the power of sort and awk:
cat file1.txt file2.txt | sort -t '|' -k1,1 -k2,2nr | awk -F"|" '!_[$1]++' | sort -t '|' -k2,2n
I think the one-liner is pretty self-explanatory, except for the awk. What that part does is that it does a uniq by the first column. The last sort is just to get the final order that you wanted.

How to sort a text file numerically and then store the results in the same text file?

I have tried sort -n test.text > test.txt. However, this leaves me with an empty text file. What is going on here and what can I do to solve this problem?
Sort does not sort the file in-place. It outputs a sorted copy instead.
You need sort -n -k 4 out.txt > sorted-out.txt.
Edit: To get the order you want you have to sort the file with the numbers read in reverse. This does it:
cut -d' ' -f4 out.txt | rev | paste - out.txt | sort -k1 -n | cut -f2- > sorted-out.txt
For more learning -
sort -nk4 file
-n for numerical sort
-k for providing key
or add -r option for reverse sorting
sort -nrk4 file
It is because you are reading and writing to the same file. You can't do that. You can try something a temporary file, as mktemp or even something as:
sort -n test.text > test1.txt
mv test1.txt test
For sort, you can also do the following:
sort -n test.text -o test.text

get the first word as result of ls -l

I need to use ls -l and I would like to have as result just the first word of the file name for instance for a result like this
-rw-r--r-- 1 root root 9 Sep 21 23:11 best file 1.txt
I would like to have only
best
as result because I need to put this value into a variable. It is ok as well if there is another way instead of using ls -l.
...sorry to bother you again...if the file is under a sub-directory, how can I hide the folder from the result? Thanks
You don't need to use ls -l (L).
Instead, use ls -1 (number one), that just outputs the names of the files, and then filter out the first column with cut:
ls -1 | cut -d' ' -f1
^
number one, not letter L
To store the value into a variable, do:
var=$(ls -1 | cut -d' ' -f1)
Note it is not a good thing to parse ls: the number of columns may vary, etc. You can read more about the topic in Why you shouldn't parse the output of ls
Update
Note there is no even need to use -1 (one), ls alone suffices:
ls | cut -d' ' -f1
As BroSlow comments below, "because they are EOL (end of line) separated across a pipe".
If you have only one row to output, this will work fine:
var=`ls -l | awk '{ print $9 }'`
echo ${var}
Or you need to use grep to filter your output for the correct file.
set -- $(ls -l)
echo ${11} # Assumes the file is the FIRST one listed.
Should do the trick. But I'm not sure if that's really what you want. For one thing, ls -l also prints an extra header line. Why do you say that you need to use ls -l? If you could state the actual problem, maybe we can find a much better solution together...
awk can pick the first word for you;
ls | awk '{print $1}'
Try:
ls -al|awk 'NR==4{ print $9 }'
Row number 4 will have first line of files. $9 indicates column 9 which will have desired word.

Why this command adds \n at the last line

I'm using this command to sort and remove duplicate lines from a file.
sort file2.txt | uniq > file2_uniq.txt
After performing the command, I find the last line with this value: \n which cause me problems. What can I do to avoid it ?
You could also let sort take care of uniquing the output, omitting the first line would avoid empty lines:
sort -u file2.txt | tail -n +2
Edit
If you also wanted to remove all empty lines I would suggest using:
grep -v '^$' | sort -u file2.txt
Just filter out what you don't want:
sort file2.txt | egrep -v "^$" | uniq > file2_uniq.txt
The problem solved by removing the last line using:
sed '$d' infile > outfile

Resources