comm command not comparing words - linux

I am trying to learn shell programming and for that I am using ubuntu app in windows 10, I read about the comm command and as I have understood it it should be working as below
file1.txt file2.txt
abc abc
cde efg
a b
b c
the result should be
a
cde
abc
b
c
efg
but what I am getting is
abc
a
cde
b
efg
abc
c
this is how I used the command
comm file1.txt file2.txt
I suspect its because I am using it on a windows app but other commands such as grep uniq ps pwd ... all are working fine
Any help would be appreciated

Windows is not the problem here. You used comm in the wrong way. man comm states
comm - compare two sorted files line by line
Therefore, you have to sort both files first.
Use
sort file1.txt > file1sorted
sort file2.txt > file2sorted
comm file1sorted file2sorted
Or if you are using bash (not plain sh or some other shell)
comm <(sort file1.txt) <(sort file2.txt)

Related

Compare Two Files and Print Lines That Don't Match

I'm trying to compare two files (file1 and file2) and print the full lines from file1 that don't match the list from file2 - ideally in a new .txt file, however when I run awk it's not printing anything.
file1 example file2 example
12345 /users/test/Desktop 543252
54321 /users/test/Downloads 12345
0000 /users/test/Desktop 11111
0000
expected output
54321 /users/test/Downloads
The command I've tried is
awk 'NR==FNR{a[$1]++;next};a[$1] ==0' file1.txt file2.txt
ideally I'd like to be able to build this into a python program I'm writing (don't know if that's possible) if not I'd be happy for it to run through the linux terminal.
Any thoughts or pointers would be gratefully received.
You have to correct your awk like below
awk 'FNR==NR{ a[$1]; next } !($1 in a)' file2 file1
You can get the expected output with grep:
grep -vf file2 file1

Merging files in reverse

I am working on the logs, they are in multiple number.
lets assume the following files has the content
file1
1
file2
2
file3
3
by using the command cat file* the result would be
1
2
3
but i am looking for some thing , while i use the regex/command using file* i want the output to be some thing like this.
3
2
1
could some one help me please.
Pass the output of cat to tac :
$ cat file*
1
2
3
$ cat file* | tac
3
2
1
You may call
ls -1r file* | xargs cat
in order to specify the order of the files. Its output is different from the tac solution, since each single logfile is in the correct order. (Perhaps this is not even the desired output).

How to select uncommon rows from two large files using linux (in terminal)?

Both have two columns: names and IDs.(files are in xls or txt format)
File 1:
AAA K0125
ccc K0234
BMN_a K0567
BMN_c K0567
File 2:
AKP K0897
BMN_a K0567
ccc K0234
I want to print uncommon rows using these two files.
how can it be done using linux terminal.
Try something like this:-
join "-t " -j 1 -v 1 file1 file2
Considering the two files are sorted.
First sort both the files and then use comm utility with -3 option
sort file1 > file1_sorted
sort file2 > file2_sorted
comm -3 file1_sorted file2_sorted
A portion from man comm
-3 suppress column 3 (lines that appear in both files)
Output:
AAA K0125
AKP K0897
BMN_c K0567

How to find common words in multiple files

I have 4 text files that contain server names as follows: (each file had about 400 lines in with various server names)
Server1
Server299
Server140
Server15
I would like to compare the files and what I want to find is server names common to all 4 files.
I've got no idea where to start - I've got access to Excel, and Linux bash. Any clever ideas?
I've used vlookup in excel to compare 2 columns but dont think this can used for 4 columns?
One way would be to say:
cat file1 file2 file3 file4 | sort | uniq -c | awk '$1==4 {print $2}'
Another way:
comm -12 <(comm -12 <(comm -12 <(sort file1) <(sort file2)) <(sort file3)) <(sort file4)

Comparing two unsorted lists in linux, listing the unique in the second file

I have 2 files with a list of numbers (telephone numbers).
I'm looking for a method of listing the numbers in the second file that is not present in the first file.
I've tried the various methods with:
comm (getting some weird sorting errors)
fgrep -v -x -f second-file.txt first-file.txt (unsure of the result, there should be more)
grep -Fxv -f first-file.txt second-file.txt
Basically looks for all lines in second-file.txt which don't match any line in first-file.txt. Might be slow if the files are large.
Also, once you sort the files (Use sort -n if they are numeric), then comm should also have worked. What error does it give? Try this:
comm -23 second-file-sorted.txt first-file-sorted.txt
You need to use comm:
comm -13 first.txt second.txt
will do the job.
ps. order of first and second file in command line matters.
also you may need to sort files before:
comm -13 <(sort first.txt) <(sort second.txt)
in case files are numerical add -n option to sort.
This should work
comm -13 <(sort file1) <(sort file2)
It seems sort -n (numeric) cannot work with comm, which uses sort (alphanumeric) internally
f1.txt
1
2
21
50
f2.txt
1
3
21
50
21 should appear in third column
#WRONG
$ comm <(sort -n f1.txt) <(sort -n f2.txt)
1
2
21
3
21
50
#OK
$ comm <(sort f1.txt) <(sort f2.txt)
1
2
21
3
50
cat f1.txt f2.txt | sort |uniq > file3

Resources