merge files with bash by primary key - linux

I have two files with with IP-Addresses as the primary key. File two has just a subset with different informations. I would like to add the 2nd column to the first file using bash.
file1:
192.168.1.1;hosta;aabbccddeef0
192.168.1.2;hostb;aabbccddeef1
192.168.1.3;hostc;aabbccddeef2
file2:
192.168.1.2;differentHostname;
My approach with for addr in cat file2 | cut -d\; -f1; do grep -w $addr file1 ... does not work, since I cannot access the hostname from file2.
Any ideas?

This is what join does:
$ join -a1 -t';' <(sort file1) <(sort file2)
192.168.1.1;hosta;aabbccddeef0
192.168.1.2;hostb,aabbccddeef1;differentHostname;
192.168.1.3;hostc,aabbccddeef2
Note: join require files in sorted order.
You can specify the order of output using the -o option:
$ join -a1 -t';' -o 1.1 1.2 2.2 1.3 <(sort file1) <(sort file2)
192.168.1.1;hosta;;aabbccddeef0
192.168.1.2;hostb;differentHostname;aabbccddeef1
192.168.1.3;hostc;;aabbccddeef2

awk -F";" -v OFS=";" 'FNR==NR{a[$1]=$2;next}($1 in a){$(NF+1)=a[$1]}1' file2 file1
tested:
> awk -F";" -v OFS=";" 'FNR==NR{a[$1]=$2;next}($1 in a){$(NF+1)=a[$1]}1' temp2 temp
192.168.1.1;hosta;aabbccddeef0
192.168.1.2;hostb;aabbccddeef1;differentHostname
192.168.1.3;hostc;aabbccddeef2
>

Related

Getting values in a file which is not present in another file

I have two files
File1.txt:
docker/registry:2.4.2
docker/rethinkdb:latest
docker/swarm:1.0.0
File2.txt:
docker/registry:2.4.1
docker/rethinkdb:1.0.0
docker/swarm:1.0.0
The output should be:
docker/registry:2.4.2
docker/rethinkdb:latest
In other words, every line in File1 that doesn't exist in File2 should be part of the output.
I have tried doing the following but it is not working.
diff File1.txt File2.txt
You could just use grep for it:
$ grep -v -f file2.txt file1.txt
docker/registry:2.4.2
docker/rethinkdb:latest
If there are lots of rows in the files I'd probably use #user000001 solution.
With awk you can do:
awk 'NR==FNR{a[$0];next}!($0 in a)' file2 file1
With comm:
comm -23 <(sort File1.txt) <(sort File2.txt)

shell script to compare two files and write the difference to third file

I want to compare two files and redirect the difference between the two files to third one.
file1:
/opt/a/a.sql
/opt/b/b.sql
/opt/c/c.sql
In case any file has # before /opt/c/c.sql, it should skip #
file2:
/opt/c/c.sql
/opt/a/a.sql
I want to get the difference between the two files. In this case, /opt/b/b.sql should be stored in a different file. Can anyone help me to achieve the above scenarios?
file1
$ cat file1 #both file1 and file2 may contain spaces which are ignored
/opt/a/a.sql
/opt/b/b.sql
/opt/c/c.sql
/opt/h/m.sql
file2
$ cat file2
/opt/c/c.sql
/opt/a/a.sql
Do
awk 'NR==FNR{line[$1];next}
{if(!($1 in line)){if($0!=""){print}}}
' file2 file1 > file3
file3
$ cat file3
/opt/b/b.sql
/opt/h/m.sql
Notes:
The order of files passed to awk is important here, pass the file to check - file2 here - first followed by the master file -file1.
Check awk documentation to understand what is done here.
You can use some tools like cat, sed, sort and uniq.
The main observation is this: if the line is in both files then it is not unique in cat file1 file2.
Furthermore in cat file1 file2| sort, all doubles are in sequence. Using uniq -u we get unique lines and have this pipe:
cat file1 file2 | sort | uniq -u
Using sed to remove leading whitespace, empty and comment lines, we get this final pipe:
cat file1 file2 | sed -r 's/^[ \t]+//; /^#/ d; /^$/ d;' | sort | uniq -u > file3

Using file1 as an Index to search file2 when file1 contains extra informations

as you can read in the title Im dealing with two files. Her is the example how the look like.
file1:
Name (additional info separated by a tab from the name)
Peter Schwarzer<tab>Best friend of mine
file2:
Name (followed by a float separated by a tab from the name)
Peter Schwarzer<tab>1456
So what i want to do is use file1 one as an index for searching file2. If the Names match it should be written in file3 which should contain the Name followed by the float from file2 followed by the additional info from file1.
So file3 should look like:
Peter Schwarzer<tab>1456<tab>Best friend of mine
(everything separated by tab)
I tried grep -f to read a pattern from a file and without the additional information it works. So is there any way to get the desired result with grep or is AWK the answer?
Thanks in advance,
Julian
give this line a try, I didn't test, but should go:
awk -F'\t' -v OFS="\t" 'NR==FNR{n[$1]=$2;next}$1 in n{print $0,n[$1]}' file1 file2 > file3
Try this awk one liner!
awk -v FS="\t" -v OFS="\t" 'FNR==NR{ A[$1]=$2; next}$1 in A{print $0,A[$1];}' file1.txt file2.txt > file3.txt
To me this looks like a job for join:
join -t '\t' file1 file2
This assumes file1 and file2 are sorted. If not, sort them first:
sort -o file1 file1
sort -o file2 file2
join -t '\t' file1 file2
If you can't modify file1 and file2 (if you need to leave them in their original, unsorted state), use a temporary file:
tmpfile=/tmp/tf$$
sort file1 > $tmpfile
sort file2 | join -t '\t' $tmpfile -
If join says "illegal tab character specification" you'll have to use join -t ' ' where you type an actual tab between the single quotes (and depending on your shell, you may have to use control-V before that tab).

remove line character form csv file

I have 2 csv files, file1 contain 1000 email address and file2 contain 150 email address which are already exist in file1.
I wonder if there is a Linux command to remove the 150 email from file1 ?
I test this :
grep -vf file2.csv file1.csv > file3.csv
it's works
This should work, with the added benefit of providing sorted output:
comm -23 <(sort file1) <(sort file2)

How to find common words in multiple files

I have 4 text files that contain server names as follows: (each file had about 400 lines in with various server names)
Server1
Server299
Server140
Server15
I would like to compare the files and what I want to find is server names common to all 4 files.
I've got no idea where to start - I've got access to Excel, and Linux bash. Any clever ideas?
I've used vlookup in excel to compare 2 columns but dont think this can used for 4 columns?
One way would be to say:
cat file1 file2 file3 file4 | sort | uniq -c | awk '$1==4 {print $2}'
Another way:
comm -12 <(comm -12 <(comm -12 <(sort file1) <(sort file2)) <(sort file3)) <(sort file4)

Resources