remove line character form csv file - linux

I have 2 csv files, file1 contain 1000 email address and file2 contain 150 email address which are already exist in file1.
I wonder if there is a Linux command to remove the 150 email from file1 ?

I test this :
grep -vf file2.csv file1.csv > file3.csv
it's works

This should work, with the added benefit of providing sorted output:
comm -23 <(sort file1) <(sort file2)

Related

Getting values in a file which is not present in another file

I have two files
File1.txt:
docker/registry:2.4.2
docker/rethinkdb:latest
docker/swarm:1.0.0
File2.txt:
docker/registry:2.4.1
docker/rethinkdb:1.0.0
docker/swarm:1.0.0
The output should be:
docker/registry:2.4.2
docker/rethinkdb:latest
In other words, every line in File1 that doesn't exist in File2 should be part of the output.
I have tried doing the following but it is not working.
diff File1.txt File2.txt
You could just use grep for it:
$ grep -v -f file2.txt file1.txt
docker/registry:2.4.2
docker/rethinkdb:latest
If there are lots of rows in the files I'd probably use #user000001 solution.
With awk you can do:
awk 'NR==FNR{a[$0];next}!($0 in a)' file2 file1
With comm:
comm -23 <(sort File1.txt) <(sort File2.txt)

How to find common words in multiple files

I have 4 text files that contain server names as follows: (each file had about 400 lines in with various server names)
Server1
Server299
Server140
Server15
I would like to compare the files and what I want to find is server names common to all 4 files.
I've got no idea where to start - I've got access to Excel, and Linux bash. Any clever ideas?
I've used vlookup in excel to compare 2 columns but dont think this can used for 4 columns?
One way would be to say:
cat file1 file2 file3 file4 | sort | uniq -c | awk '$1==4 {print $2}'
Another way:
comm -12 <(comm -12 <(comm -12 <(sort file1) <(sort file2)) <(sort file3)) <(sort file4)

check list of email addresses against other list

I have two files with email addresses (one per line): file1 and file2.
How can I remove all the emails in file1 which also exist in file2? Looking for a bash answer, but any other scripting language is fine as well.
If it helps, in each file are only unique email addresses.
join -v1 <(sort file1) <(sort file2)
This tells join to print the lines (emails) in file1 that do not appear in file2. They have to be sorted, whence the <(sort ...).
If you must preserve the order for whatever reason and want to be overly complicated by contemplating case sensitiveness and carriage returns (^M) you can try:
perl -e '%e=();while(<>){s/[\r\n]//g;$e{lc($_)}=1}open($so,"<","file1");while(<$so>){s/[\r\n]//g;print "$_\n" if(!exists($e{lc($_)}))}close($so)' file2

merge files with bash by primary key

I have two files with with IP-Addresses as the primary key. File two has just a subset with different informations. I would like to add the 2nd column to the first file using bash.
file1:
192.168.1.1;hosta;aabbccddeef0
192.168.1.2;hostb;aabbccddeef1
192.168.1.3;hostc;aabbccddeef2
file2:
192.168.1.2;differentHostname;
My approach with for addr in cat file2 | cut -d\; -f1; do grep -w $addr file1 ... does not work, since I cannot access the hostname from file2.
Any ideas?
This is what join does:
$ join -a1 -t';' <(sort file1) <(sort file2)
192.168.1.1;hosta;aabbccddeef0
192.168.1.2;hostb,aabbccddeef1;differentHostname;
192.168.1.3;hostc,aabbccddeef2
Note: join require files in sorted order.
You can specify the order of output using the -o option:
$ join -a1 -t';' -o 1.1 1.2 2.2 1.3 <(sort file1) <(sort file2)
192.168.1.1;hosta;;aabbccddeef0
192.168.1.2;hostb;differentHostname;aabbccddeef1
192.168.1.3;hostc;;aabbccddeef2
awk -F";" -v OFS=";" 'FNR==NR{a[$1]=$2;next}($1 in a){$(NF+1)=a[$1]}1' file2 file1
tested:
> awk -F";" -v OFS=";" 'FNR==NR{a[$1]=$2;next}($1 in a){$(NF+1)=a[$1]}1' temp2 temp
192.168.1.1;hosta;aabbccddeef0
192.168.1.2;hostb;aabbccddeef1;differentHostname
192.168.1.3;hostc;aabbccddeef2
>

Comparing two unsorted lists in linux, listing the unique in the second file

I have 2 files with a list of numbers (telephone numbers).
I'm looking for a method of listing the numbers in the second file that is not present in the first file.
I've tried the various methods with:
comm (getting some weird sorting errors)
fgrep -v -x -f second-file.txt first-file.txt (unsure of the result, there should be more)
grep -Fxv -f first-file.txt second-file.txt
Basically looks for all lines in second-file.txt which don't match any line in first-file.txt. Might be slow if the files are large.
Also, once you sort the files (Use sort -n if they are numeric), then comm should also have worked. What error does it give? Try this:
comm -23 second-file-sorted.txt first-file-sorted.txt
You need to use comm:
comm -13 first.txt second.txt
will do the job.
ps. order of first and second file in command line matters.
also you may need to sort files before:
comm -13 <(sort first.txt) <(sort second.txt)
in case files are numerical add -n option to sort.
This should work
comm -13 <(sort file1) <(sort file2)
It seems sort -n (numeric) cannot work with comm, which uses sort (alphanumeric) internally
f1.txt
1
2
21
50
f2.txt
1
3
21
50
21 should appear in third column
#WRONG
$ comm <(sort -n f1.txt) <(sort -n f2.txt)
1
2
21
3
21
50
#OK
$ comm <(sort f1.txt) <(sort f2.txt)
1
2
21
3
50
cat f1.txt f2.txt | sort |uniq > file3

Resources