remove line character form csv file

remove line character form csv file - linux

I have 2 csv files, file1 contain 1000 email address and file2 contain 150 email address which are already exist in file1.
I wonder if there is a Linux command to remove the 150 email from file1 ?

I test this :
grep -vf file2.csv file1.csv > file3.csv
it's works

This should work, with the added benefit of providing sorted output:
comm -23 <(sort file1) <(sort file2)

Related

Getting values in a file which is not present in another file

I have two files
File1.txt:
docker/registry:2.4.2
docker/rethinkdb:latest
docker/swarm:1.0.0
File2.txt:
docker/registry:2.4.1
docker/rethinkdb:1.0.0
docker/swarm:1.0.0
The output should be:
docker/registry:2.4.2
docker/rethinkdb:latest
In other words, every line in File1 that doesn't exist in File2 should be part of the output.
I have tried doing the following but it is not working.
diff File1.txt File2.txt

You could just use grep for it:
$ grep -v -f file2.txt file1.txt
docker/registry:2.4.2
docker/rethinkdb:latest
If there are lots of rows in the files I'd probably use #user000001 solution.

With awk you can do:
awk 'NR==FNR{a[$0];next}!($0 in a)' file2 file1

With comm:
comm -23 <(sort File1.txt) <(sort File2.txt)

How to find common words in multiple files

I have 4 text files that contain server names as follows: (each file had about 400 lines in with various server names)
Server1
Server299
Server140
Server15
I would like to compare the files and what I want to find is server names common to all 4 files.
I've got no idea where to start - I've got access to Excel, and Linux bash. Any clever ideas?
I've used vlookup in excel to compare 2 columns but dont think this can used for 4 columns?

One way would be to say:
cat file1 file2 file3 file4 | sort | uniq -c | awk '$1==4 {print $2}'
Another way:
comm -12 <(comm -12 <(comm -12 <(sort file1) <(sort file2)) <(sort file3)) <(sort file4)

check list of email addresses against other list

I have two files with email addresses (one per line): file1 and file2.
How can I remove all the emails in file1 which also exist in file2? Looking for a bash answer, but any other scripting language is fine as well.
If it helps, in each file are only unique email addresses.

join -v1 <(sort file1) <(sort file2)
This tells join to print the lines (emails) in file1 that do not appear in file2. They have to be sorted, whence the <(sort ...).

If you must preserve the order for whatever reason and want to be overly complicated by contemplating case sensitiveness and carriage returns (^M) you can try:
perl -e '%e=();while(<>){s/[\r\n]//g;$e{lc($_)}=1}open($so,"<","file1");while(<$so>){s/[\r\n]//g;print "$_\n" if(!exists($e{lc($_)}))}close($so)' file2

merge files with bash by primary key

I have two files with with IP-Addresses as the primary key. File two has just a subset with different informations. I would like to add the 2nd column to the first file using bash.
file1:
192.168.1.1;hosta;aabbccddeef0
192.168.1.2;hostb;aabbccddeef1
192.168.1.3;hostc;aabbccddeef2
file2:
192.168.1.2;differentHostname;
My approach with for addr in cat file2 | cut -d\; -f1; do grep -w $addr file1 ... does not work, since I cannot access the hostname from file2.
Any ideas?

This is what join does:
$ join -a1 -t';' <(sort file1) <(sort file2)
192.168.1.1;hosta;aabbccddeef0
192.168.1.2;hostb,aabbccddeef1;differentHostname;
192.168.1.3;hostc,aabbccddeef2
Note: join require files in sorted order.
You can specify the order of output using the -o option:
$ join -a1 -t';' -o 1.1 1.2 2.2 1.3 <(sort file1) <(sort file2)
192.168.1.1;hosta;;aabbccddeef0
192.168.1.2;hostb;differentHostname;aabbccddeef1
192.168.1.3;hostc;;aabbccddeef2

awk -F";" -v OFS=";" 'FNR==NR{a[$1]=$2;next}($1 in a){$(NF+1)=a[$1]}1' file2 file1
tested:
> awk -F";" -v OFS=";" 'FNR==NR{a[$1]=$2;next}($1 in a){$(NF+1)=a[$1]}1' temp2 temp
192.168.1.1;hosta;aabbccddeef0
192.168.1.2;hostb;aabbccddeef1;differentHostname
192.168.1.3;hostc;aabbccddeef2
>

Comparing two unsorted lists in linux, listing the unique in the second file

I have 2 files with a list of numbers (telephone numbers).
I'm looking for a method of listing the numbers in the second file that is not present in the first file.
I've tried the various methods with:
comm (getting some weird sorting errors)
fgrep -v -x -f second-file.txt first-file.txt (unsure of the result, there should be more)

grep -Fxv -f first-file.txt second-file.txt
Basically looks for all lines in second-file.txt which don't match any line in first-file.txt. Might be slow if the files are large.
Also, once you sort the files (Use sort -n if they are numeric), then comm should also have worked. What error does it give? Try this:
comm -23 second-file-sorted.txt first-file-sorted.txt

You need to use comm:
comm -13 first.txt second.txt
will do the job.
ps. order of first and second file in command line matters.
also you may need to sort files before:
comm -13 <(sort first.txt) <(sort second.txt)
in case files are numerical add -n option to sort.

This should work
comm -13 <(sort file1) <(sort file2)
It seems sort -n (numeric) cannot work with comm, which uses sort (alphanumeric) internally
f1.txt
1
2
21
50
f2.txt
1
3
21
50
21 should appear in third column
#WRONG
$ comm <(sort -n f1.txt) <(sort -n f2.txt)
1
2
21
3
21
50
#OK
$ comm <(sort f1.txt) <(sort f2.txt)
1
2
21
3
50

cat f1.txt f2.txt | sort |uniq > file3

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

remove line character form csv file - linux

I have 2 csv files, file1 contain 1000 email address and file2 contain 150 email address which are already exist in file1. I wonder if there is a Linux command to remove the 150 email from file1 ?

I test this : grep -vf file2.csv file1.csv > file3.csv it's works

This should work, with the added benefit of providing sorted output: comm -23 <(sort file1) <(sort file2)

Related

Getting values in a file which is not present in another file

How to find common words in multiple files

check list of email addresses against other list

merge files with bash by primary key

Comparing two unsorted lists in linux, listing the unique in the second file

Categories

Resources