How to delete lines in file1 if comumn - linux

I have a file with 4 columns and i need to delete from file1 if column 3 is in files 2
Example:
File1:
14769,marty.------#googlemail.com,c076a7b6a52857ddf2f2e60d71dda6bf,49
14770,maryfi-------#googlemail.com,23fc2887a3a8248ddea570b5700b1708,49
14771,n.s------#googlemail.com,e504a6617f375ce04f4e51f1ec66dd93,49
14772,paula------#googlemail.com,f918f5b8df1d6285892d003c2fb9e3cf,49
14773,pkec.------#googlemail.com,4ca2c5d670f324c31a20854873bf63ac,49
14774,squi-------#googlemail.com,d26a0296a361b79afd98ede1af918f6d,49
File 2:
d26a0296a361b79afd98ede1af918f6d
4ca2c5d670f324c31a20854873bf63ac
so result will be like this
14769,marty.------#googlemail.com,c076a7b6a52857ddf2f2e60d71dda6bf,49
14770,maryfi-------#googlemail.com,23fc2887a3a8248ddea570b5700b1708,49
14771,n.s------#googlemail.com,e504a6617f375ce04f4e51f1ec66dd93,49
14772,paula------#googlemail.com,f918f5b8df1d6285892d003c2fb9e3cf,49
i have tried with this
awk -F',' 'NR==FNR {a[$1]=$3 ;next} !($3 in a) {print }' OFS='\t' file1 file2
but not working

I can't add a comment not enough rep; but I've tried your code with gawk and it did remove the two lines as you wanted. The reason you don't get tab delimited output is that OFS takes effect only after $0 is rebuilt, so you can force this by simple assignment like $1=$1 and your OFS='\t':
{a[$1]=$3 ;next} !($3 in a) {$1=$1; print}' OFS='\t' file2 file1
Result:
14769 marty.------#googlemail.com c076a7b6a52857ddf2f2e60d71dda6bf 49
14770 maryfi-------#googlemail.com 23fc2887a3a8248ddea570b5700b1708 49
14771 n.s------#googlemail.com e504a6617f375ce04f4e51f1ec66dd93 49
14772 paula------#googlemail.com f918f5b8df1d6285892d003c2fb9e3cf 49

Related

Matching two files and print all columns

I have two files I want to match according to column 1 in file 1 and column 2 in file 2.
File 1:
1000019 -0.013936 0.0069218 -0.0048443 -0.0053688
1000054 0.013993 0.0044969 -0.0050022 -0.0043233
File 2:
5131885 1000019
1281471 1000054
I would like to print all columns after matching.
Expected output (file 3):
5131885 1000019 -0.013936 0.0069218 -0.0048443 -0.0053688
1281471 1000054 0.013993 0.0044969 -0.0050022 -0.0043233
I tried the following:
awk 'FNR==NR{arr[$1]=$2;next} ($2 in arr){print $0,arr[$2]}' file1 file2 > file3
join file1 file2 > file3 #after sorting
This awk should work
awk 'NR==FNR {r[$2]=$1; next}{print r[$1], $0}' $file2 $file1
Output
5131885 1000019 -0.013936 0.0069218 -0.0048443 -0.0053688
1281471 1000054 0.013993 0.0044969 -0.0050022 -0.0043233

replace pattern in file 2 with pattern in file 1 if contingency is met

I have two tab delimted data files the file1 looks like:
cluster_j_72 cluster-32 cluster-32 cluster_j_72
cluster_j_75 cluster-33 cluster-33 cluster_j_73
cluster_j_8 cluster-68 cluster-68 cluster_j_8
the file2 looks like:
NODE_148 67545 97045 cluster-32
NODE_221 1 42205 cluster-33
NODE_168 1 24506 cluster-68
I would like to confirm that, for a given row, in file1 columns 2 and 3; as well as 1 and 4 are identical. If this is the case then I would like to take the value for that row from column 2 (file 1) find it in file2 and replace it with the value from column 1 (file 1). Thus the new output of file 2 would look like this (note because column 1 and 4 dont match for cluster 33 (file1) the pattern is not replaced in file2):
NODE_148 67545 97045 cluster_j_72
NODE_221 1 42205 cluster-33
NODE_168 1 24506 cluster_j_8
I have been able to get the contingency correct (here printing the value from file1 i'd like to use to replace a value in file2):
awk '{if($2==$3 && $1==$4){print $1}}'file1
If I could get sed to draw values ($2 and $1) from file1 while looking in file 2 this would work:
sed 's/$2(from file1)/$1(from file1)/' file2
But I don't seem to be able to nest this sed in the previous awk statement, nor get sed to look for a pattern originating in a different file than it's looking in.
thanks!
You never need sed when you're using awk since awk can do anything that sed can do.
This might be what you're trying to do:
$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR==FNR {
if ( ($1 == $4) && ($2 == $3) ) {
map[$2] = $1
}
next
}
$4 in map { $4 = map[$4] }
{ print }
$ awk -f tst.awk file1 file2
NODE_148 67545 97045 cluster_j_72
NODE_221 1 42205 cluster-33
NODE_168 1 24506 cluster_j_8

how to Merge 2 tables with awk

First of all, sorry for my English and I know there's a lot of various topics regarding AWK but it's a very difficult function to me...
I would like to merge two tables using common columns with awk. The tables differ in the amount of rows. I have my first table that I want to modify and the second as a reference table. I would like to compare my colunme1.F1 with my column1.F2. When it matches, add the column2.F2 in my file1. But I need to keep all my lines in file1.
I give you an example:
File1
Num_id,Name,description1,description2,description3
?,atlanta_1,,,
RO_5,babeni_SW,,,
? ,Bib1,,,
RO_9,BoUba_456,,,
?,Castor,,,
File2
official_Num_id,official_Name
RO_1,America
RO_2,Andre
RO_3,Atlanta
RO_4,Axa
RO_5,Babeni
RO_6,Barba
RO_7,Bib
RO_8,Bilbao
RO_9,Bouba
RO_10,Castor
File3
Num_id,Name,description1,description2,description3,official_Name
?,atlanta_1,,,
RO_5,babeni_SW,,,Babeni
?,Bib1,,,
RO_9,BoUba_456,,,Bouba
?,Castor,,,
I read a lot of solution on Internet and it seems that awk could work ..
I tried awk 'NR==FNR {h[$1] = $2; next} {print $0,h[$1]}' $File1 $File2 > file3
But my command doesn't work, my File3 looks exactly that File1.
In a second time, I don't know if it's possible to compare my two second columns when names have difference like atlanta_1 and Atlanta and add the official_num_id and the official_name in my File1.
Any hero over there?
You had it, except for two small things. First you need to set your file separators to , and, second, reverse the order of your input files on the command line so that the reference file is processed first:
$ awk 'BEGIN {FS=OFS=","} NR==FNR {h[$1] = $2; next} {print $0,h[$1]}' File2 File1
Num_id,Name,description1,description2,description3,
?,atlanta_1,,,,
RO_5,babeni_SW,,,,Babeni
? ,Bib1,,,,
RO_9,BoUba_456,,,,Bouba
?,Castor,,,,
You can also use the join command for this:
join --header --nocheck-order -t, -1 1 -2 1 -a 1 file1 file2
To answer your question if it's possible to compare my two second columns when names have difference like atlanta_1 and Atlanta and add the official_num_id and the official_name in my File1:
$ awk '
BEGIN { FS=OFS="," }
NR==FNR { # file2
a[tolower($2)]=$0 # hash on lowercase city
next
}
{ # file1
split($2,b,"[^[:alpha:]]") # split on non-alphabet
print $0 (tolower(b[1]) in a?OFS a[tolower(b[1])]:"")
}' file2 file1
Num_id,Name,description1,description2,description3
?,atlanta_1,,,,RO_3,Atlanta
RO_5,babeni_SW,,,,RO_5,Babeni
? ,Bib1,,,,RO_7,Bib
RO_9,BoUba_456,,,,RO_9,Bouba
?,Castor,,,,RO_10,Castor
split will split Name field on non-alphabetic characters, ie _ in atlanta_1, 1 in Bib1 etc. so it might fail on cities with dashes etc., edit the pattern [^[:alpha:]] in split accordingly. Header doesn't match with those names, rethink the header names.

Getting Issues In Adding A Tab Using AWK

Sample Logs
Location Number Status Comment
Delhi 919xxx Processed Test File 1
Mumbai 918xxx Got Stucked Test File 123
I'm trying to add one tab space after Status using AWK, but getting error.
Sample Query
awk '{$3 = $3 "\t"; print}' z
Getting Output As
Location Number Status Comment
Delhi 919xxx Processed Test File 1
Mumbai 918xxx **Got** **Stucked** Test File 123
As it is taking 'Got Stucked' as multiple fields please suggest.
If you only want one tab after the header text Status to make it look better, use sub to the first record only:
$ awk 'NR==1 {sub(/Status/,"Status\t")} 1' file
Location Number Status Comment
Delhi 919xxx Processed Test File 1
Mumbai 918xxx Got Stucked Test File 123
This way awk won't rebuild the record and replace FS with OFS etc.
#JamesBrown's answer sounds like what you asked for but also consider:
$ awk -F' +' -v OFS='\t' '{$1=$1}1' file | column -s$'\t' -t
Location Number Status Comment
Delhi 919xxx Processed Test File 1
Mumbai 918xxx Got Stucked Test File 123
The awk converts every sequence of 2+ spaces to a tab so the result is a tab-separated stream which column can then convert to a visually aligned table if that's your ultimate goal. Or you could generate a CSV to read into Excel or similar:
$ awk -F' +' -v OFS=',' '{$1=$1}1' file
Location,Number,Status,Comment
Delhi,919xxx,Processed,Test File 1
Mumbai,918xxx,Got Stucked,Test File 123
$ awk -F' +' -v OFS=',' '{for(i=1;i<=NF;i++) $i="\""$i"\""}1' file
"Location","Number","Status","Comment"
"Delhi","919xxx","Processed","Test File 1"
"Mumbai","918xxx","Got Stucked","Test File 123"
or more robustly:
$ awk -F' +' -v OFS=',' '{$1=$1; for(i=1;i<=NF;i++) { gsub(/"/,"\"\"",$i); if ($i~/[[:space:],"]/) $i="\""$i"\"" } }1' file
Location,Number,Status,Comment
Delhi,919xxx,Processed,"Test File 1"
Mumbai,918xxx,"Got Stucked","Test File 123"
If your input fields aren't always separated by at least 2 blank chars then tell us how they are separated.
Try using
awk -F' ' '{$3 = $3 "\t"; print}' z
The problem is that awk consider (by default) a single space as the separator between two column. This means that Got Stucked are actually two different columns.
With -F' ' you tell awk to use a double space as the separator to distinguish between two columns.

how to read a file in awk command

I have two files that look like:
**file1.txt**
"a","1","11","111"
"b","2","22","222"
"c","3","33","333"
"d","4","44","444"
"e","5","55","555"
"f","6","66","666"
**file2.txt**
"b"
"d"
"a"
"c"
"e"
"f"
I need to create a script that changes the order of file1 and begin with the order of file2. e.g.:
"b","2","22","222"
"d","4","44","444"
"a","1","11","111"
"c","3","33","333"
"e","5","55","555"
"f","6","66","666"
I created a command that looks like:
nawk '/^("b")/' file1 ; nawk '/^("d")/' file1 ; nawk '/^("a")/' file1 ; nawk '/^("c")/' file1 ; nawk '/^("e")/' file1 ; nawk '/^("f")/' file1
It does the trick, however I would like to further automate it, but don't know how to proceed. How could I create a command or variable that would look at line 1 of file2("b") and put it the above command, then look at line 2 of file2("d"), and put it in the above command, and so on. Basically if possible, I would like the command to look at file 2 and fill in the blanks in the above command. Any other more convenient commands you guys can suggest would be appreciated. Note that I currently have to manually insert the letters from file 2 in the above command.
The actual file may contain well over 100 lines
awk -F, 'NR==FNR { a[$1]=$0; next }
($1 in a) { print a[$1] }' file1 file2
This reads all of file1 into memory, then prints in the order of file2. If file1 is very large, this may not be feasible.
This is a common Awk idiom; search the many near-duplicates if you need a more detailed explanation.

Resources