Print the input search string if grep doesn't match - search

I have file1
BOB
JOHN
SALLY
I have file2
There was a boy called JOHN and he was playing with FRED while
JILL went off to find a bucket of water from TOM but she
fell down the hill.
I want to iterate through the file1 words and search for these in file2.
I want to print the words that are NOT found in file2.
So the output would be
BOB
SALLY
I guess it is if the grep fails, I'd like to print the string that grep was searching for.
I'm starting here:
grep -o -f file1 file2
But of course, this returns
JOHN
How would I get the original search strings that didn't match - to print instead?

Here is a grep one liner to get this done:
grep -vxFf <(tr '[[:blank:]]' '\n' < file2) file1
BOB
SALLY
Using tr to convert space/tab to newline first then using grep -vxFf to get non-matching words in file1.
Or as David suggested in comments below:
grep -vxFf <(printf '%s\n' $(<file2)) file1

With your shown samples could you please try following.
awk '
FNR==NR{
arr[$0]
next
}
{
for(i in arr){
if(index($0,i)){
delete arr[i]
next
}
}
}
END{
for(i in arr){
print i
}
}
' file1 file2

If the order isn't critical, you can use:
awk '
FNR == NR { a[$1]=0; next }
{ for (i=1;i<=NF;i++)
if ($i in a)
a[$i]++
}
END {
for (i in a)
if (!a[i])
print i
}
' file1 file2
Example Use/Output
$ awk '
> FNR == NR { a[$1]=0; next }
> { for (i=1;i<=NF;i++)
> if ($i in a)
> a[$i]++
> }
> END {
> for (i in a)
> if (!a[i])
> print i
> }
> ' file1 file2
SALLY
BOB

Related

linux shell get multi file intersection

I have a few txt file examples 1.txt 2.txt 3.txt 4.txt
I want to get 1.txt 2.txt 3.txt 4.txt content intersection
cat 1.txt 2.txt | sort | uniq -c > tmp.txt
cat tmp.txt 3.txt | sort | uniq -c > tmp2.txt
and so on ....
Is there a better way?
input text
1.txt
1
2
3
4
2.txt
1
2
3
3.txt
1
2
4.txt
1
5
expected output:
1
With your shown samples please try following awk code.
1st solution: This considers that you may have duplicates values of lines with in a single Input_file itself then you may try following:
awk '
!arr2[FILENAME,$0]++{
arr1[$0]++
}
END{
for(i in arr1){
if(arr1[i]==(ARGC-1)){
print i
}
}
}
' *.txt
2nd solution: This solution assumes that there is no duplicates in Input_file if this is the case then try following:
awk '
{
arr[$0]++
}
END{
for(i in arr){
if(arr[i]==(ARGC-1)){
print i
}
}
}
' *.txt
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
arr[$0]++ ##Creating an array named arr with index of $0 and keep increasing its value.
}
END{ ##Starting END block of this program from here.
for(i in arr){ ##Traversing through array arr here.
if(arr[i]==(ARGC-1)){ ##Checking condition if value of current item in arr is Equal to total number of files then print it.
print i
}
}
}
' *.txt ##Passing all .txt files as an input to awk program from here.

implementing Excel-vlookup-like function with awk

I have a question about vlookup function implementation with awk. I have a csv file having id-score pairs like this (say 1.csv):
id,score
1,16
3,12
5,13
11,8
13,32
17,37
23,74
29,7
31,70
41,83
There are "unscored" guys. I also have a csv file including all registered guys both scored and unscored like this (say, 2.csv) (I transposed for the want of space)
id,1,3,5,7,11,13,17,19,23,29,31,37,41
I would like to generate id-score pairs according to 2nd csv file so as to include both scored and unscored guys. For unscored guys, NAN would be used instead of the digit.
In other words, final result is desired to be like this:
id,score
1,16
3,12
5,13
7,NAN
11,8
13,32
17,37
19,NAN
23,74
29,7
31,70
37,NAN
41,83
When I tried to create a new table with the following awk command, it did not work to me. Thanks in advance for any advice.
awk 'FNR==NR{a[$1]++; next} {print $0, (a[$1]) ? a[$2] : "NAN"}' 1.csv 2.csv
here is your script with fixes: set field separators; save the score value for each id; print the value from lookup, if missing NaN
$ awk 'BEGIN {FS=OFS=","}
FNR==NR {a[$1]=$2; next}
{print $1, (($1 in a)?a[$1]:"NAN")}' file1 file2
id,score
1,16
3,12
5,13
7,NAN
11,8
13,32
17,37
19,NAN
23,74
29,7
31,70
37,NAN
41,83
With bash and join:
echo "id,score"
join --header -j 1 -t ',' <(sort 1.csv | grep -v '^id') <(tr ',' '\n' < 2.csv | grep -v '^id' | sort) -e "NAN" -a 2 -o 2.1,1.2 | sort -n
Output:
id,score
1,16
3,12
5,13
7,NAN
11,8
13,32
17,37
19,NAN
23,74
29,7
31,70
37,NAN
41,83
See: man join
With awk could you please try following, written with shown samples in GNU awk. Considering(like your shown samples) your both the Input_files have headers in their first line.
awk -v counter=2 '
FNR==1{
next
}
FNR==NR{
a[FNR]=$0
b[FNR]=$1
next
}
{
if($0==b[counter]){
print a[counter]
counter++
}
else{
print $0",NA"
}
}
' FS="," 1.csv <(tr ',' '\n' < 2.csv)
Explanation: Adding detailed explanation for above.
awk -v counter=2 ' ##Starting awk program from here and setting counter as 2.
FNR==1{ ##Checking condition if line is 1st then do following.
next ##next will skip all further statements from here.
}
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when Input_file 1.csv is being read.
a[FNR]=$0 ##Creating array a with index FNR and value of current line.
b[FNR]=$1 ##Creating array b with index FNR and value of 1st field of current line.
next ##next will skip all further statements from here.
}
{
if($0==b[counter]){ ##Checking condiiton if current line is same as array b with index counter value then do following.
print a[counter] ##Printing array a with index of counter here.
counter++ ##Increasing count of counter by 1 each time cursor comes here.
}
else{ ##Else part of for above if condition starts here.
print $0",NA" ##Printing current line and NA here.
}
}
' FS="," 1.csv <(tr ',' '\n' < 2.csv) ##Setting FS as , for Input_file 1.csv and sending 2.csv output by changing comma to new line to awk.
An awk solution could be:
awk -v FS=, -v OFS=, '
NR == 1 { print; next }
NR == FNR { score[$1] = $2; next }
{ for (i = 2; i <= NF; ++i)
print $i, score[$i] == "" ? "NAN" : score[$i] }
' 1.csv 2.csv

Shell script to find increase in occurance count between two files

File1.log
2000 apple
2333 cat
5343 dog
1500 lion
File2.log
2500 apple
2333 cat
1700 lion
Need a shell script to output as below:
500 apple
200 lion
Have tried lot of solution but nothing worked out as I'm having both text and string. Could someone help on this. Thanks
EDIT(by RavinderSingh13): Added OP's efforts which OP had shown in comments to in post:
#!/bin/bash
input1="./File1.log"
input2="./File2.log"
while IFS= read -r line2
do
while IFS=read -r line1
do
echo "$line1"
done < "$input1"
echo "$line2"
done < "$input2"
Could you please try following.
awk 'FNR==NR{a[$2]=$1;next} $2 in a && ($1-a[$2])>0{print $1-a[$2],$2}' file1 file2
Adding a non-one liner form of above solution:
awk '
FNR==NR{
a[$2]=$1
next
}
($2 in a) && ($1-a[$2])>0{
print $1-a[$2],$2
}
' Input_file1 Input_file2
Explanation: Adding a detailed explanation for above solution here.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE once file1 is being read then do following.
a[$2]=$1 ##Creating an array a whose index is $2 and value is $1 of current line.
next ##Using next function of awk, to skip all further lines from here.
} ##Closing condition BLOCK for FNR==NR here.
($2 in a) && ($1-a[$2])>0{ ##Checking condition if $2 is present in array a AND difference of $1 and array a with index $2 is greater than 0 then do following.
print $1-a[$2],$2 ##Printing difference between $1 and array a with index $2 along with current $2 here.
} ##Closing BLOCK for above condition here.
' file1 file2 ##Mentioning Input_file names here.
awk '{if (!($2 in entry)) { entry[$2]=$1 } else { delta=$1-entry[$2]; if (delta!=0) {print delta,$2} } }' FILE_1 FILE2
You can also put this into a file, e.g. delta.awk:
{
if (!($2 in entry)) {
entry[$2]=$1
} else {
delta=$1-entry[$2]
if (delta !=0) { # Only output lines of non-zero increment/decrement
print delta,$2
}
}
}
Invoke via awk -f delta.awk FILE_1.txt FILE_2.txt.

Merge two files using awk in linux

I have a 1.txt file:
betomak#msn.com||o||0174686211||o||7880291304ca0404f4dac3dc205f1adf||o||Mario||o||Mario||o||Kawati
zizipi#libero.it||o||174732943.0174732943||o||e10adc3949ba59abbe56e057f20f883e||o||Tiziano||o||Tiziano||o||D'Intino
frankmel#hotmail.de||o||0174844404||o||8d496ce08a7ecef4721973cb9f777307||o||Melanie||o||Melanie||o||Kiesel
apoka-paris#hotmail.fr||o||0174847613||o||536c1287d2dc086030497d1b8ea7a175||o||Sihem||o||Sihem||o||Sousou
sofianomovic#msn.fr||o||174902297.0174902297||o||9893ac33a018e8d37e68c66cae23040e||o||Nabile||o||Nabile||o||Nassime
donaldduck#yahoo.com||o||174912161.0174912161||o||0c770713436695c18a7939ad82bc8351||o||Donald||o||Donald||o||Duck
cernakova#centrum.cz||o||0174991962||o||d161dc716be5daf1649472ddf9e343e6||o||Dagmar||o||Dagmar||o||Cernakova
trgsrl#tiscali.it||o||0175099675||o||d26005df3e5b416d6a39cc5bcfdef42b||o||Esmeralda||o||Esmeralda||o||Trogu
catherinesou#yahoo.fr||o||0175128896||o||2e9ce84389c3e2c003fd42bae3c49d12||o||Cat||o||Cat||o||Sou
ermimurati24#hotmail.com||o||0175228687||o||a7766a502e4f598c9ddb3a821bc02159||o||Anna||o||Anna||o||Beratsja
cece_89#live.fr||o||0175306898||o||297642a68e4e0b79fca312ac072a9d41||o||Celine||o||Celine||o||Jacinto
kendinegel39#hotmail.com||o||0175410459||o||a6565ca2bc8887cde5e0a9819d9a8ee9||o||Adem||o||Adem||o||Bulut
A 2.txt file:
9893ac33a018e8d37e68c66cae23040e:134:#a1
536c1287d2dc086030497d1b8ea7a175:~~#!:/92\
8d496ce08a7ecef4721973cb9f777307:demodemo
FS for 1.txt is "||o||" and for 2.txt is ":"
I want to merge two files in a single file result.txt based on the condition that the 3rd column of 1.txt must match with 1st column of 2.txt file and should be replaced by the 2nd column of 2.txt file.
The expected output will contain all the matching lines:
I am showing you one of them:
sofianomovic#msn.fr||o||174902297.0174902297||o||134:#a1||o||Nabile||o||Nabile||o||Nassime
I tried the script:
awk -F"||o||" 'NR==FNR{s=$0; sub(/:[^:]*$/, "", s); a[s]=$NF;next} {s = $5; for (i=6; i<=NF; ++i) s = s "," $i; if (s in a) { NF = 5; $5=a[s]; print } }' FS=: <(tr -d '\r' < 2.txt) FS="||o||" OFS="||o||" <(tr -d '\r' < 1.txt) > result.txt
But getting an empty file as the result. Any help would be highly appreciated.
If your actual Input_file(s) are same as shown sample then following awk may help you in same.
awk -v s1="||o||" '
FNR==NR{
a[$9]=$1 s1 $5;
b[$9]=$13 s1 $17 s1 $21;
next
}
($1 in a){
print a[$1] s1 $2 FS $3 s1 b[$1]
}
' FS="|" 1.txt FS=":" 2.txt
EDIT: Since OP has changed requirement a bit so providing code as per new ask where it will create 2 files too 1 file which will have ids present in 1.txt and NOT in 2.txt and other will be vice versa of it.
awk -v s1="||o||" '
FNR==NR{
a[$9]=$1 s1 $5;
b[$9]=$13 s1 $17 s1 $21;
c[$9]=$0;
next
}
($1 in a){
val=$1;
$1="";
sub(/:/,"");
print a[val] s1 $0 s1 b[val];
d[val]=$0;
next
}
{
print > "NOT_present_in_2.txt"
}
END{
for(i in d){
delete c[i]
};
for(j in c){
print j,c[j] > "NOT_present_in_1.txt"
}}
' FS="|" 1.txt FS=":" OFS=":" 2.txt
You can use this awk to get your output:
awk -F ':' 'NR==FNR{a[$1]=$2 FS $3; next} FNR==1{FS=OFS="||o||"; gsub(/[|]/, "\\\\&", FS)}
$3 in a{$3=a[$3]; print}' file2 file1 > result.txt
cat result.txt
frankmel#hotmail.de||o||0174844404||o||demodemo:||o||Melanie||o||Melanie||o||Kiesel
apoka-paris#hotmail.fr||o||0174847613||o||~~#!:/92\||o||Sihem||o||Sihem||o||Sousou
sofianomovic#msn.fr||o||174902297.0174902297||o||134:#a1||o||Nabile||o||Nabile||o||Nassime

Run query in Linux for selecting CSV'S

In the Linux:
there are many .csvs' in the folder, I have to select those csv's file having column name {'PREDICT' = 646}.
check this link:
https://prnt.sc/gone85
what kind of query works?
Providing test data which was unprovided ):
$ cat > file1
ACTUAL PREDICT
1 2
3 646
$ cat > file2
ACTUAL PREDICT
1 2
3 666
Then some GNU awk (nextfile) to select those csv's file having column name {'PREDICT' = 646} or where there is column PREDICT with a value 646:
$ awk 'FNR==1{for(i=1;i<=NF;i++)if($i=="PREDICT")p=i}$p==646{print FILENAME;nextfile}' file1 file2
file1
Explained:
awk '
FNR==1 { # get the column number of PREDICT column for each file
for(i=1;i<=NF;i++)
if($i=="PREDICT")
p=i # set it to p
}
$p==646 { # if p==646, we have a match
print FILENAME # print the filename
nextfile # and move on to the next file
}' file1 file2 # all the candicate files
gnu awk solution without loop:
$ cat tst.awk
BEGIN{FS=","}
FNR==1 && s=substr($0,1,index($0,"PREDICT")) { # look for index of PREDICT
i=sub(/,/, "", s) + 1 # and count nr of times you
# can replace "," in preceding
# substring
}
s && $i==646 { print FILENAME; nextfile }
some input:
$ cat file1.csv
ACTUAL,PREDICT,COUNTRY,REGION,DIVISION,PRODUCTTYPE,PRODUCT,QUARTER,YEAR,MONTH
925,850,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,533,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,646,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
$ cat file2.csv
ACTUAL,PREDICT,COUNTRY,REGION,DIVISION,PRODUCTTYPE,PRODUCT,QUARTER,YEAR,MONTH
925,850,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,533,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,111,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
and:
$ cp file1.csv file3.csv
gives:
$ awk -f tst.awk *.csv
file1.csv
file3.csv
Or use a one-liner:
$ awk -F, 'FNR==1 && s=substr($0,1,index($0,"PREDICT")) {i=sub(/,/, "", s) + 1}s && $i==646 { print FILENAME; nextfile }' *.csv
file1.csv
file3.csv

Resources