Compare Two Files and Print Lines That Don't Match - linux

I'm trying to compare two files (file1 and file2) and print the full lines from file1 that don't match the list from file2 - ideally in a new .txt file, however when I run awk it's not printing anything.
file1 example file2 example
12345 /users/test/Desktop 543252
54321 /users/test/Downloads 12345
0000 /users/test/Desktop 11111
0000
expected output
54321 /users/test/Downloads
The command I've tried is
awk 'NR==FNR{a[$1]++;next};a[$1] ==0' file1.txt file2.txt
ideally I'd like to be able to build this into a python program I'm writing (don't know if that's possible) if not I'd be happy for it to run through the linux terminal.
Any thoughts or pointers would be gratefully received.

You have to correct your awk like below
awk 'FNR==NR{ a[$1]; next } !($1 in a)' file2 file1

You can get the expected output with grep:
grep -vf file2 file1

Related

Getting values in a file which is not present in another file

I have two files
File1.txt:
docker/registry:2.4.2
docker/rethinkdb:latest
docker/swarm:1.0.0
File2.txt:
docker/registry:2.4.1
docker/rethinkdb:1.0.0
docker/swarm:1.0.0
The output should be:
docker/registry:2.4.2
docker/rethinkdb:latest
In other words, every line in File1 that doesn't exist in File2 should be part of the output.
I have tried doing the following but it is not working.
diff File1.txt File2.txt
You could just use grep for it:
$ grep -v -f file2.txt file1.txt
docker/registry:2.4.2
docker/rethinkdb:latest
If there are lots of rows in the files I'd probably use #user000001 solution.
With awk you can do:
awk 'NR==FNR{a[$0];next}!($0 in a)' file2 file1
With comm:
comm -23 <(sort File1.txt) <(sort File2.txt)

grep between two files

I want to find matching lines from file 2 when compared to file 1.
file2 contains multiple columns and column one contains information that could match file1.
I tried below commands and they didn't give any matching results (contents in file1 are definitely in file2) . I have used these commands previously to compare between different files and they worked.
grep -f file1 file2
grep -Fwf file1 file2
When i tried to grep whatever that's not matching, i get results
grep -vf file1 file2
file1 contains list of genes (754 genes) , one line each
ATM
ATP5B
ATR
ATRIP
ATRX
I have a feeling the problem is with my file1. When I tried to type several items manually in my file1 just to test, and do grep with file2, I get the matching lines from file2.
When I copied the contents of file1 (originally in excel) into notepad making a .txt file, I didn't get any matching results.
I can't see any problem with my file1. Any suggestion?
You said,
I copied the contents of file1 (originally in excel) into notepad making a .txt file
It's likely that the txt file contains carriage-return/linefeed pairs which are screwing up the grep. As I suggested in a comment, try this:
tr -d '\015' < file1 > file1a
grep -Fwf file1a file2
The tr invocation deletes all the carriage returns, giving you a proper Unix/Linux text file with only newlines (\n) as line terminators.
You said:
I can't see any problem with my file1.
Here's how to see the extra-carriage-return problem:
cat -v test1
Those little ^M markers at the end of each line are cat -v's way of showing you the carriage return control codes.
Addendum:
Carriage Return (CR) is decimal 13, hex 0x0d, octal 015, \r in C.
Line Feed (LF) is decimal 10, hex 0x0a, octal 012, \n in C.
Because it's an old-school utility, tr accepts octal (base 8) notation for control characters.
(I think in some versions tr -d '\r' would work, but I'm not sure, and anyway I'm not sure what version you have. tr -d '\015' should be universal.)
Simple shell script that performs grep for every input in file1.txt
#!/bin/bash
while read content; do
grep -q "$content" file2.txt
if [ $? -eq "0" ]; then
echo "$content" was found in file2 >> results.txt
fi
done < file1.txt
Let's suppose this is file2:
$ cat file2
a b ATM
c d e
f ATR g
Using grep and process substitution
We can get lines from file1 that match any of the columns in file2 via:
$ grep -wFf <(sed 's/[[:space:]]/\n/g' file2) file1
ATM
ATR
This works because it converts file2 to a form that grep understands:
$ sed 's/[[:space:]]/\n/g' file2
a
b
ATM
c
d
e
f
ATR
g
Using awk
$ awk 'FNR==NR{for (i=1;i<=NF;i++) seen[$i]; next} $0 in seen' file2 file1
ATM
ATR
Here, awk keeps track of every column that it sees in file2 and then print only those lines in file1 that match one of those columns
Try to use command
comm
it is a reversed version of diff

shell script to compare two files and write the difference to third file

I want to compare two files and redirect the difference between the two files to third one.
file1:
/opt/a/a.sql
/opt/b/b.sql
/opt/c/c.sql
In case any file has # before /opt/c/c.sql, it should skip #
file2:
/opt/c/c.sql
/opt/a/a.sql
I want to get the difference between the two files. In this case, /opt/b/b.sql should be stored in a different file. Can anyone help me to achieve the above scenarios?
file1
$ cat file1 #both file1 and file2 may contain spaces which are ignored
/opt/a/a.sql
/opt/b/b.sql
/opt/c/c.sql
/opt/h/m.sql
file2
$ cat file2
/opt/c/c.sql
/opt/a/a.sql
Do
awk 'NR==FNR{line[$1];next}
{if(!($1 in line)){if($0!=""){print}}}
' file2 file1 > file3
file3
$ cat file3
/opt/b/b.sql
/opt/h/m.sql
Notes:
The order of files passed to awk is important here, pass the file to check - file2 here - first followed by the master file -file1.
Check awk documentation to understand what is done here.
You can use some tools like cat, sed, sort and uniq.
The main observation is this: if the line is in both files then it is not unique in cat file1 file2.
Furthermore in cat file1 file2| sort, all doubles are in sequence. Using uniq -u we get unique lines and have this pipe:
cat file1 file2 | sort | uniq -u
Using sed to remove leading whitespace, empty and comment lines, we get this final pipe:
cat file1 file2 | sed -r 's/^[ \t]+//; /^#/ d; /^$/ d;' | sort | uniq -u > file3

Using file1 as an Index to search file2 when file1 contains extra informations

as you can read in the title Im dealing with two files. Her is the example how the look like.
file1:
Name (additional info separated by a tab from the name)
Peter Schwarzer<tab>Best friend of mine
file2:
Name (followed by a float separated by a tab from the name)
Peter Schwarzer<tab>1456
So what i want to do is use file1 one as an index for searching file2. If the Names match it should be written in file3 which should contain the Name followed by the float from file2 followed by the additional info from file1.
So file3 should look like:
Peter Schwarzer<tab>1456<tab>Best friend of mine
(everything separated by tab)
I tried grep -f to read a pattern from a file and without the additional information it works. So is there any way to get the desired result with grep or is AWK the answer?
Thanks in advance,
Julian
give this line a try, I didn't test, but should go:
awk -F'\t' -v OFS="\t" 'NR==FNR{n[$1]=$2;next}$1 in n{print $0,n[$1]}' file1 file2 > file3
Try this awk one liner!
awk -v FS="\t" -v OFS="\t" 'FNR==NR{ A[$1]=$2; next}$1 in A{print $0,A[$1];}' file1.txt file2.txt > file3.txt
To me this looks like a job for join:
join -t '\t' file1 file2
This assumes file1 and file2 are sorted. If not, sort them first:
sort -o file1 file1
sort -o file2 file2
join -t '\t' file1 file2
If you can't modify file1 and file2 (if you need to leave them in their original, unsorted state), use a temporary file:
tmpfile=/tmp/tf$$
sort file1 > $tmpfile
sort file2 | join -t '\t' $tmpfile -
If join says "illegal tab character specification" you'll have to use join -t ' ' where you type an actual tab between the single quotes (and depending on your shell, you may have to use control-V before that tab).

Awk to print each file

I have about 50 files in a directory
Have
File1: 1|2|3
File2: 3|4|5
File3: A|B|C
WANT
File1: A|1|2|3
File2: A|3|4|5
File3: A|A|B|C
I'll appreciate if anyone can solve it with awk command. I'm open to other solutions in linux. Also, I want to run it once an perform edits on all files in a directory.
The solution (see below) I have will require me to run it on each file one at a time and I don't think that's efficient
awk '{print "A|"$0}' File1
Try the below sed command,
sed -i 's/^/A|/' file1 file2 file3
To make it work on all the files in the current directory,
sed -i 's/^/A|/' *
With GNU awk for -i inplace:
gawk -i inplace '{print "A|"$0}' file1 file2 file3

Resources