Merge two files on Linux keeping only lines that appear in both files - linux

In Linux, how can I merge two files and only keep lines that have a match in both files?
Each line is separated by a newline (\n).
So far, I found to sort it, then use comm -12. Is this the best approach (assuming it's correct)?
fileA contains
aaa
bbb
ccc
ddd
fileB contains
aaa
ddd
eee
and I'd like a new file to contain
aaa
ddd

Provided both of your two input files are lexicographically sorted, you can indeed use comm:
$ comm -12 fileA fileB > fileC
If that's not the case, you should sort your input files first:
$ comm -12 <(sort fileA) <(sort fileB) > fileC

Related

Find different line in 2 UNSORTED files with different size

I have to files f_hold and f_new. f_new is 2 times bigger than f_hold. Both files are unsorted.
How can I discard lines in f_new which are in f_hold? ex:
f_hold:
aaa
bbb
ccc
ddd
eee
f_new:
ppp
ddd
aaa
ccc
bbb
fff
jjj
nnn
what I want:
ppp
fff
jjj
nnn
So, it is not a simple line by line comparison.
I tried several tips like 'grep -Fxv -f', 'comm' etc... but they are making line by line comparison. Is there a linux command to do that?
for the example you provided, using grep will work:
grep -v -f f_hold f_new
-f flag means "read from file"
-v flag is used for invert matching.
UPDATE:
I guess awk can be much faster:
awk 'NR==FNR{a[$0];next} !($0 in a)' f_hold f_new

Comparing two files and applying the differences

on a Linux based system, I can easily compare two files, e.g.:
diff file1.txt file2.txt
...and see the difference between them.
What if I want to take all lines that are unique to file2.txt and apply them to file1.txt so that file1.txt will now contain everything it had + lines from file2.txt that it didn't have before? Is there an easy way to do it?
Using patch
You can use diff's output to create a patch file.
diff original_file file_with_new_lines > patch_file
You can edit patch_file to keep only the additions, since you only want the new lines.
Then you can use the patch command to apply this patch file:
patch original_file patch_file
If you don't mind appending the sorted diff to your file, you can use comm:
cat file1.txt <(comm -13 <(sort f1.txt) <(sort f2.txt)) > file1.txt.patched
or
comm -13 <(sort f1.txt) <(sort f2.txt) | cat file1.txt - > file1.txt.patched
This will append the unique lines from file2.txt to file1.txt.

How to select uncommon rows from two large files using linux (in terminal)?

Both have two columns: names and IDs.(files are in xls or txt format)
File 1:
AAA K0125
ccc K0234
BMN_a K0567
BMN_c K0567
File 2:
AKP K0897
BMN_a K0567
ccc K0234
I want to print uncommon rows using these two files.
how can it be done using linux terminal.
Try something like this:-
join "-t " -j 1 -v 1 file1 file2
Considering the two files are sorted.
First sort both the files and then use comm utility with -3 option
sort file1 > file1_sorted
sort file2 > file2_sorted
comm -3 file1_sorted file2_sorted
A portion from man comm
-3 suppress column 3 (lines that appear in both files)
Output:
AAA K0125
AKP K0897
BMN_c K0567

How to find common words in multiple files

I have 4 text files that contain server names as follows: (each file had about 400 lines in with various server names)
Server1
Server299
Server140
Server15
I would like to compare the files and what I want to find is server names common to all 4 files.
I've got no idea where to start - I've got access to Excel, and Linux bash. Any clever ideas?
I've used vlookup in excel to compare 2 columns but dont think this can used for 4 columns?
One way would be to say:
cat file1 file2 file3 file4 | sort | uniq -c | awk '$1==4 {print $2}'
Another way:
comm -12 <(comm -12 <(comm -12 <(sort file1) <(sort file2)) <(sort file3)) <(sort file4)

How to display only different rows using diff (bash)

How can I display only different rows using diff in a separate file?
For example, the file number 1 contains the line:
1;john;125;3
1;tom;56;2
2;jack;10;5
A file number 2 contains the following lines:
1;john;125;3
1;tom;58;2
2;jack;10;5
How to make in the following happen?
1;tom;58;2
a.txt:
1;john;125;3
1;tom;56;2
2;jack;10;5
b.txt:
1;john;125;3
1;tom;58;2
2;jack;10;5
Use comm:
comm -13 a.txt b.txt
1;tom;58;2
The command line options to comm are pretty straight-forward:
-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)
-3 suppress column 3 (lines that appear in both files)
Here's a simple solution that I think is better than diff:
sort file1 file2 | uniq -u
sort file1 file2 concatenates the two files and sorts it
uniq -u prints the unique lines (that do not repeat). It requires the input to be pre-sorted.
Assuming you want to retain only the lines unique to file 2 you can do:
comm -13 file1 file2
Note that the comm command expects the two files to be in sorted order.
Using group format specifiers you can suppress printing of unchanged lines and print only changed lines for changed
diff --changed-group-format="%>" --unchanged-group-format="" file1 file2

Resources