Why does uniq -c command return duplicates in some cases? - linux

I am trying to grep for words in a file that is not present in another file
grep -v -w -i -r -f "dont_use_words.txt" "list_of_words.txt" >> inverse_match_words.txt
uniq -c -i inverse_match_words.txt | sort -nr
But I get duplicate values in my uniq command. Why so?
I am wondering if it might be because grep differentiates between strings, say, "AAA" found in "GIRLAAA", "AAABOY", "GIRLAAABOY" and therefore, I end up with duplicates.
When I do a grep -F "AAA" all of them are returned though.
I'd appreciate if someone could help me out on this. I am new to Linux OS.

uniq eliminates all but one line in each group of consecutive duplicate lines. The conventional way to use it, therefore, is to pass the input through sort first. You're not doing that, so yes, it is entirely possible that (non-consecutive) duplicates will remain in the output.
Example:
grep -v -w -i -f dont_use_words.txt list_of_words.txt \
| sort -f \
| uniq -c -i \
| sort -nr

Related

Pipelining cut sort uniq

Trying to get a certain field from a sam file, sort it and then find the number of unique numbers in the file. I have been trying:
cut -f 2 practice.sam > field2.txt | sort -o field2.txt sortedfield2.txt |
uniq -c sortedfield2.txt
The cut is working to pull out the numbers from field two, however when trying to sort the numbers into a new file or the same file I am just getting a blank. I have tried breaking the pipeline into sections but still getting the same error. I am meant to use those three functions to achieve the output count.
Use
cut -f 2 practice.sam | sort -o | uniq -c
In your original code, you're redirecting the output of cut to field2.txt and at the same time, trying to pipe the output into sort. That won't work (unless you use tee). Either separate the commands as individual commands (e.g., use ;) or don't redirect the output to a file.
Ditto the second half, where you write the output to sortedfield2.txt and thus end up with nothing going to stdout, and nothing being piped into uniq.
So an alternative could be:
cut -f 2 practice.sam > field2.txt ; sort -o field2.txt sortedfield2.txt ; uniq -c sortedfield2.txt
which is the same as
cut -f 2 practice.sam > field2.txt
sort -o field2.txt sortedfield2.txt
uniq -c sortedfield2.txt
you can use this command:
cut -f 2 practise.sam | uniq | sort > sorted.txt
In your code is wrong. The fault is "No such file or directory". Because of pipe. You can learn at this link how it is used
https://www.guru99.com/linux-pipe-grep.html

how to use sort, cut, and unique commands in pipe

I was wondering how do you use the cut, sort, and uniq commands in a pipeline and give a command line that indicates how many users are using each of the shells mentioned in /etc/passwd?
i'm not sure if this is right but
cut -f1 -d':' /etc/passwd | sort -n | uniq
?
Summarizing the answers excruciatingly hidden in comments:
You were close, only
as tripleee noticed, the shell is in the seventh field
as shellter noticed, since the shells are not numbers, -n is useless
as shellter noticed, for the counting, there's uniq -c
That gives
cut -f7 -d: /etc/passwd | sort | uniq -c

How to get the no of matched occurrences using grep command in linux?

If we use grep -c option it will give you the each occurrences only once per line. But I need the total no of matched occurrences not line count.
Use this
grep -o pattern path | wc -l
You can use -o flag to output only the matched part and then pipe it to wc -w to get word count.
Eg: ls ~ | grep -o pattern | wc -w

Linux awk sort descending order not working

I have two files that I need to sort with.
The command I'm using is:
cat first-in.txt | awk '{print $2}' | cut -d '/' -f 3 | cut -d '^' -f 1 | sort -b -t . -k 1,1nr -k 2,2nr -k 3,3r -k 4,4r -k 5,5r | uniq > first-out.txt
cat second-in.txt | awk '{print $2}' | cut -d '/' -f 3 | cut -d '^' -f 1 | sort -b -t . -k 1,1nr -k 2,2nr -k 3,3r -k 4,4r -k 5,5r | uniq > second-out.txt
the issue is:
I need to sort CORRECTLY in descending order, because right now, only file 2 is sorting correctly, but file 1 is not sorting correctly.
i would like to know the mistake i am making
Files
All files are here including output are here
Thanks in advance.
I guess you mean this is wrong:
4.2.4
4.2.3
4.2.20
4.2.2
You want 4.2.20 to be higher than all of those, right?
You can fix that by change the -k param of sort to treat all fields as numeric:
.... -k 1,1nr -k 2,2nr -k 3,3nr -k 4,4nr -k 5,5nr ....
On GNU/Linux system you could use sort with the -V option:
sed -r 's|.*/([^/^]*).*$|\1|' infile | sort -Vr
Note that both sed -r and sort -V are not standard.

Sorting in bash

I have been trying to get the unique values in each column of a tab delimited file in bash. So, I used the following command.
cut -f <column_number> <filename> | sort | uniq -c
It works fine and I can get the unique values in a column and its count like
105 Linux
55 MacOS
500 Windows
What I want to do is instead of sorting by the column value names (which in this example are OS names) I want to sort them by count and possibly have the count in the second column in this output format. So It will have to look like:
Windows 500
MacOS 105
Linux 55
How do I do this?
Use:
cut -f <col_num> <filename>
| sort
| uniq -c
| sort -r -k1 -n
| awk '{print $2" "$1}'
The sort -r -k1 -n sorts in reverse order, using the first field as a numeric value. The awk simply reverses the order of the columns. You can test the added pipeline commands thus (with nicer formatting):
pax> echo '105 Linux
55 MacOS
500 Windows' | sort -r -k1 -n | awk '{printf "%-10s %5d\n",$2,$1}'
Windows 500
Linux 105
MacOS 55
Mine:
cut -f <column_number> <filename> | sort | uniq -c | awk '{ print $2" "$1}' | sort
This will alter the column order (awk) and then just sort the output.
Hope this will help you
Using sed based on Tagged RE:
cut -f <column_number> <filename> | sort | uniq -c | sort -r -k1 -n | sed 's/\([0-9]*\)[ ]*\(.*\)/\2 \1/'
Doesn't produce output in a neat format though.

Resources