How to sort a text file numerically and then store the results in the same text file? - linux

I have tried sort -n test.text > test.txt. However, this leaves me with an empty text file. What is going on here and what can I do to solve this problem?

Sort does not sort the file in-place. It outputs a sorted copy instead.
You need sort -n -k 4 out.txt > sorted-out.txt.
Edit: To get the order you want you have to sort the file with the numbers read in reverse. This does it:
cut -d' ' -f4 out.txt | rev | paste - out.txt | sort -k1 -n | cut -f2- > sorted-out.txt
For more learning -
sort -nk4 file
-n for numerical sort
-k for providing key
or add -r option for reverse sorting
sort -nrk4 file

It is because you are reading and writing to the same file. You can't do that. You can try something a temporary file, as mktemp or even something as:
sort -n test.text > test1.txt
mv test1.txt test
For sort, you can also do the following:
sort -n test.text -o test.text

Related

Pipelining cut sort uniq

Trying to get a certain field from a sam file, sort it and then find the number of unique numbers in the file. I have been trying:
cut -f 2 practice.sam > field2.txt | sort -o field2.txt sortedfield2.txt |
uniq -c sortedfield2.txt
The cut is working to pull out the numbers from field two, however when trying to sort the numbers into a new file or the same file I am just getting a blank. I have tried breaking the pipeline into sections but still getting the same error. I am meant to use those three functions to achieve the output count.
Use
cut -f 2 practice.sam | sort -o | uniq -c
In your original code, you're redirecting the output of cut to field2.txt and at the same time, trying to pipe the output into sort. That won't work (unless you use tee). Either separate the commands as individual commands (e.g., use ;) or don't redirect the output to a file.
Ditto the second half, where you write the output to sortedfield2.txt and thus end up with nothing going to stdout, and nothing being piped into uniq.
So an alternative could be:
cut -f 2 practice.sam > field2.txt ; sort -o field2.txt sortedfield2.txt ; uniq -c sortedfield2.txt
which is the same as
cut -f 2 practice.sam > field2.txt
sort -o field2.txt sortedfield2.txt
uniq -c sortedfield2.txt
you can use this command:
cut -f 2 practise.sam | uniq | sort > sorted.txt
In your code is wrong. The fault is "No such file or directory". Because of pipe. You can learn at this link how it is used
https://www.guru99.com/linux-pipe-grep.html

Make bashscript shorter with pipes

I have some textfiles (all files have this scheme in each line 123:abc) and want to make two seperate files with these. One big file with all lines (but uniq) and with this a file with the strings after the token ":".
This here works:
cat *.txt >> bigtextfile.txt
sort -u bigtextfile.txt -o bigtextfile.txt
cat bigtextfile.txt | cut -d: -f2 >> bigtextfile-filtered.txt
But can i do this much shorter with pipes?
sort accepts multiple file inputs, so you can produce your bigtextfile.txt in one sitting :
sort -u *.txt -o bigtextfile.txt
cut also accepts a file input parameter, no need for cat :
cut -d: -f2 bigtextfile.txt >> bigtextfile-filtered.txt
If you don't need the bigtextfile.txt in itself and just use it as an intermediate to producing bigtextfile-filtered.txt you can do that in one line :
sort -u *.txt | cut -d: -f2 >> bigtextfile-filtered.txt
I suggest:
sort -u *.txt | cut -d: -f2 >> bigtextfile-filtered.txt
Try this:
cat *.txt | sort -u | cut -d: -f2 >> bigtextfile-filtered.txt

Find duplicate entries in a text file using shell

I am trying to find duplicate *.sh entry mention in a text file(test.log) and delete it, using shell program. Since the path is different so uniq -u always print duplicate entry even though there are two first_prog.sh entry in a text file
cat test.log
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/first_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh
output:
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh
I tried couple of way using few command but dont have idea on how to get above output.
rev test.log | cut -f1 -d/ | rev | sort | uniq -d
Any clue on this?
You can use awk for this by splitting fields on / and using $NF (last field) in an associative array:
awk -F/ '!seen[$NF]++' test.log
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh
awk shines for these kind of tasks but here in a non awk solution,
$ sed 's|.*/|& |' file | sort -k2 -u | sed 's|/ |/|'
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh
or, if your path is balanced (the same number of parents for all files)
$ sort -t/ -k5 -u file
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh
awk '!/my_shellprog\/test\/first/' file
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh

How to filter multiple files and eliminate duplicate entries to select a single entry while using linux shell

I have a folder that contains several files. These files consist of identical columns.
Let us say file1 and file2 have contents as follows.(Here it can be more than two files)
$cat file1.txt
9999999999|1200
8888888888|1400
7777777777|1255
6666666666|1788
7777777777|1289
9999999999|1300
$cat file2.txt
9999999999|2500
8888888888|2450
6666666666|2788
9999999999|3000
2222222222|3001
In my file 1st column is mobile number and 2nd is count. Same mobile can be there in multiple files. Now I want to get the records into a file with unique mobile numbers which has the highest count.
The output should be as follows:
$cat output.txt
7777777777|1289
8888888888|2450
6666666666|2788
9999999999|3000
2222222222|3001
Any help would be appreciated.
That's probably not very efficient but it does the job:
put this into phones.sh and run sh phones.sh
#!/bin/bash
files="
file1.txt
file2.txt
"
phones=$(cat $files | cut -d'|' -f1 | sort -u)
for phone in $phones; do grep -h $phone $files | sort -t'|' -k 2 -nr | head -n1; done | sort -t'|' -k 2
What it does is basically, extract all the phone numbers in the files, iterate over them and grep them in all files, select the one with the highest count. Then I also sorted the final result by count, which is what your expected result suggests. sort -t'|' -k 2 -nr means sort the second column given the delimiter |, by decreasing numerical order. head -n1 selects the first line. You can add other files into the files variable.
Another way of doing this is to use the power of sort and awk:
cat file1.txt file2.txt | sort -t '|' -k1,1 -k2,2nr | awk -F"|" '!_[$1]++' | sort -t '|' -k2,2n
I think the one-liner is pretty self-explanatory, except for the awk. What that part does is that it does a uniq by the first column. The last sort is just to get the final order that you wanted.

Why this command adds \n at the last line

I'm using this command to sort and remove duplicate lines from a file.
sort file2.txt | uniq > file2_uniq.txt
After performing the command, I find the last line with this value: \n which cause me problems. What can I do to avoid it ?
You could also let sort take care of uniquing the output, omitting the first line would avoid empty lines:
sort -u file2.txt | tail -n +2
Edit
If you also wanted to remove all empty lines I would suggest using:
grep -v '^$' | sort -u file2.txt
Just filter out what you don't want:
sort file2.txt | egrep -v "^$" | uniq > file2_uniq.txt
The problem solved by removing the last line using:
sed '$d' infile > outfile

Resources