Regex issue for match a column value - linux

I wrote a script to extract a column value from a file which doesn't matches the pattern defined in col metadata file.
But it is not returning the right output. Can anyone point out the issue here? I was trying to match string with double quotes .quotes also needs to be matched.
Code:
`awk -F'|' -v n="$col_pos" -v m="$col_patt" 'NR!=1 && $n !~ "^" m "$" {
printf "%s:%s:%s\n", FILENAME, FNR, $0 > "/dev/stderr"
count++
}
END {print count}' $input_file`
run output :-
++ awk '-F|' -v n=4 -v 'm="[a-z]+#gmail.com"' 'NR!=1 && $n !~ "^" m "$" {
printf "%s:%s:%s\n", FILENAME, FNR, $0 > "/dev/stderr"
count++
}
END {print count}' /test/data/infa_shared/dev/SrcFiles/datawarehouse/poc/BNX.csv
10,22,"00AF","abc#gmail.com",197,10,1/1/2020 12:06:10.260 PM,"BNX","Hard b","50","Us",1,"25" -- this line is not expected in output as it matches the email pattern "[a-z]+#gmail.com". pattern is extracted from the below file
Input file for pattern extraction file_col_metadata
FILE_ID~col_POS~COL_START_POS~COL_END_POS~datatype~delimited_ind~col_format~columnlength
5~4~~~char~Y~"[a-z]+#gmail.com"~100

If you replace awk -F'|' ... with awk -F',' ... it will work.

Related

awk with bash variable along with condition to be checked

I need to search and replace a pattern from file
[ec2_server]
server_host=something
[list_server]
server_host=old_name
to
[ec2_server]
server_host=something
[list_server]
server_host=new_name
I'm able to get it working with
awk '/\[list_server]/ { print; getline; $0 = "server_host=new_name" } 1'
But I'm trying to parameterize the search pattern, the parameter name to change and the parameter value to change.
PATTERN_TO_SEARCH=[list_server]
PARAM_NAME=server_host
PARAM_NEW_VALUE=new_name
But it is not working when I parameterize and pass the variables to awk
awk -v patt=$PATTERN_TO_SEARCH -v parm=$PARAM_NAME -v parmval=$PARAM_NEW_VALUE '/\patt/ { print; getline; $0 = "parm=parmval" } 1' file.txt
You have two instances of the same problem: you're trying to use a
variable name inside a string value. Awk can't read your mind: it
can't intuit that sometimes when your write "HOME" you mean "print the
value of the variable HOME" and other times you mean "print the word
HOME".
We need to make two separate changes:
First, to use a variable in your search pattern, you can use
syntax like this:
awk -v patt='some text' '$0 == patt {print}'
(Note that here we're using an equality match, ==; you can also use a regular expression match, ~, but in this particular case that would only complicate things).
With your example file content, running:
awk -v patt='[list_server]' '$0 == patt {print}' file.txt
Produces:
[list_server]
Next, when you write $0 = "parm=parmval", you're setting $0 to the literal string parm=parmval. If you want to perform variable substitution, consider using sprintf():
awk \
-v patt="$PATTERN_TO_SEARCH" \
-v parm="$PARAM_NAME" \
-v parmval="$PARAM_NEW_VALUE"\
'
$0 == patt { print; getline; $0 = sprintf("%s=%s\n", parm, parmval) } 1
' file.txt
Which gives us:
[ec2_server]
server_host=something
[list_server]
server_host=new_server
Have your awk code in following way, as experts recommend not to use getline(since it has edge cases in its use). So I am going with find the string and then set flag(custom variable made by me in program) and then print the line accordingly with using regex along with passed value from shell variable.
Along with matching and printing the new value we need to set field separator also to fetch correct value and replace/print it with new value. So I made field separator as = here for whole Input_file. By doing this approach you need not to pass any variable which has server_host value in it, since its already present in Input_file so we can take it from there.
awk solution with mentioning value within awk variable itself and then check regex in main program of awk for comparison.
awk -v var="list_server" -v newVal="NEW_VALUE" '
BEGIN{ FS=OFS="=" }
$0 ~ "^\\[" var "\\]$"{
found=1
print
next
}
found{
print $1 OFS newVal
found=""
next
}
1
' Input_file
OR awk solution to get value from shell variable and then use regex inside awk to match condition:
varS="list_server" ##Shell variable
newvalue="NEW_VALUE" ##Shell variable
awk -v var="$varS" -v newVal="$newvalue" '
BEGIN{ FS=OFS="=" }
$0 ~ "^\\[" var "\\]$"{
found=1
print
next
}
found{
print $1 OFS newVal
found=""
next
}
1
' Input_file
$ awk -v pat="$PATTERN_TO_SEARCH" -v parm="$PARAM_NAME" -v parmval="$PARAM_NEW_VALUE" '
f{$0=parm"="parmval; f=0} $0==pat{f=1} 1
' file
[ec2_server]
server_host=something
[list_server]
server_host=new_name
This makes the assumption "${PARAM_NAME}" immediately follows the search pattern row :
_P2S_='[list_server]'
_PNM_='server_host'
_PNV_='new_name'
echo "${...input...}" | gtee >( gpaste - | gcat -b >&2; echo ) | gcat - |
{m,n,g}awk -v __="${_P2S_}=${_PNM_}=${_PNV_}" -F= 'BEGIN {
$(_-=_)=__;___= $(_ = NF); FS ="^"(OFS = $--_ FS)
__= $-(_+=-_--) } (NR-_)< NF ? ($NF =___)^(_-=_) :_=NR*(-!!_)^(__!=$!_)' |
gcat -b | gcat -n | ecp
1 [ec2_server]
2 server_host=something
3 [list_server]
4 server_host=old_name
1 1 [ec2_server]
2 2 server_host=something
3
4 3 [list_server]
5 4 server_host=new_name

Adding double quotes around non-numeric columns by awk

I have a file like this;
2018-01-02;1.5;abcd;111
2018-01-04;2.75;efgh;222
2018-01-07;5.25;lmno;333
2018-01-09;1.25;prs;444
I'd like to add double ticks to non-numeric columns, so the new file should look like;
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
I tried this so far, know that this is not the correct way
head myfile.csv -n 4 | awk 'BEGIN{FS=OFS=";"} {gsub($1,echo $1 ,$1)} 1' | awk 'BEGIN{FS=OFS=";"} {gsub($3,echo "\"" $3 "\"",$3)} 1'
Thanks in advance.
You may use this awk that sets ; as input/output delimiter and then wraps each field with "s if that field is non-numeric:
awk '
BEGIN {
FS = OFS = ";"
}
{
for (i=1; i<=NF; ++i)
$i = ($i+0 == $i ? $i : "\"" $i "\"")
} 1' file
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
Alternative gnu-awk solution:
awk -v RS='[;\n]' '$0+0 != $0 {$0 = "\"" $0 "\""} {ORS=RT} 1' file
Using GNU awk and typeof(): Fields - - that are numeric strings have the strnum attribute. Otherwise, they have the string attribute.1
$ gawk 'BEGIN {
FS=OFS=";"
}
{
for(i=1;i<=NF;i++)
if(typeof($i)=="string")
$i=sprintf("\"%s\"",$i)
}1' file
Some output:
"2018-01-02";1.5;"abcd";111
- -
Edit:
If some the fields are already quoted:
$ gawk 'BEGIN {
FS=OFS=";"
}
{
for(i=1;i<=NF;i++)
if(typeof($i)=="string")
gsub(/^"?|"?$/,"\"",$i)
}1' <<< string,123,"quoted string"
Output:
"string",123,"quoted string"
Further enhancing upon anubhava's solution (including handling fields already double-quoted :
gawk -e 'sub(".+",$-_==+$-_?"&":(_)"&"_\
)^gsub((_)_, _)^(ORS = RT)' RS='[;\n]' \_='\42'
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
"2018-01-09";1.25;"prs";111111111111111111112222222222
222222223333333333333333333333
333344444444444444444499999999
999991111111111111111111122222
222222222222233333333333333333
333333333444444444444444444999
999999999991111111111111111111
122222222222222222233333333333
333333333333333444444444444444
444999999999999991111111111111
111111122222222222222222233333
333333333333333333333444444444
444444444999999999999991111111
111111111111122222222222222222
233333333333333333333333333444
444444444444444999999999999991
111111111111111111122222222222
222222233333333333333333333333
333444444444444444444999999999
999991111111111111111111122222
222222222222233333333333333333
333333333444444444444444444999
999999999999

how to print lines with specific column matching members of array in bash

#!/bin/bash
awk '$1 == "abc" {print}' file # print lines first column matching "abc"
How to print lines when the first column matching members of array("12" or "34" or "56")?
#!/bin/bash
ARR=("12" "34" "56")
Add
Also, how to print lines when the first column exactly matching members of array("12" or "34" or "56")?
You could use bash to interpolate the string to a regex pattern used in Awk, by changing the IFS value to a | character and do array expansion as below:
ARR=("12" "34" "56")
regex=$( IFS='|'; echo "${ARR[*]}" )
awk -v str="$regex" '$1 ~ str' file
The array expansion converts the list elements to a string delimited with |, for e.g. 12|34|56 in this case.
The $() runs in the sub-shell do that the value of IFS is not reflcted in the parent shell. You could make it in one line as
awk -v str="$( IFS='|'; echo "${ARR[*]}" )" '$1 ~ str' file
OP had also asked for an exact match of the strings from the array in the file, in that case using grep with its ERE support can do the job
regex=$( IFS='|'; echo "${ARR[*]}" )
egrep -w "$regex" file
(or)
grep -Ew "$regex" file
awk one-liner
awk -v var="${ARR[*]}" 'BEGIN{split(var,array," "); for(i in array) a[array[i]] } ($1 in a){print $0}' file
The following code does the trick:
awk 'BEGIN{myarray [0]="aaa";myarray [1]="bbb"; test=0 }{
test=0;
for ( x in myarray ) {
if($1 == myarray[x]){
test=1;
break;
}
}
if(test==0) print}'
If you need to pass a variable to awk use the -v option, however for array it is a bit tricker but the following syntax should work.
A=( $( ls -1p ) ) #example of list to be passed to awk (to be adapted to your needs)
awk -v var="$A" 'BEGIN{split(var,list,"\n")}END{ for (i in list) print i}'
Near the same as Inian
ARR=("34" "56" "12");regex=" ${ARR[*]} ";regex="'^${regex// /\\|^}'";grep -w $regex infile

using variable defined outside Awk

I have codded the following lines :
ARRAY=($(awk 'FS = ";" {print $3}' file.txt))
LINE_CREATOR=`echo "aaaa;bbbb;cccccccc" |
'{awk -F";"};
END
for (i in ARRAY)
{
print $'${ARRAY['i']}'
}
}'`
the File.txt looks like
1;8;3
4;6;1
7;9;2
Explanation :
the array contains the value : 3 1 2
so the loop will loop on the array , and extract fields $3 $1 $2 from the "aaaa;bbbb;cccccccc" using awk
and the final output should be this
ccccccccaaaabbbb
I still have some errors while launching my script.
I'm making a few guesses here but I think that this does what you want:
$ echo "aaaa;bbbb;cccccccc" | awk -F\; 'NR == FNR { n = split($0, a); next }
{ printf "%s", a[$3] } END { print "" }' - file
ccccccccaaaabbbb
NR == FNR means that the block is only run for the first input. - as an argument tells awk to read first from standard input. The string is split on FS (;) into the array a. next skips the rest of the script.
The second block is only run for the second input (the text file). The values in the third field are used to print the elements in the array a.
if you want to pass the index as an awk variable, here is another way
$ awk -F';' -v ix="$(cut -d\; -f3 file | paste -sd\;)" '
BEGIN{n=split(ix,a)}
{for(i=1;i<n;i++) printf "%s",$a[i];
printf "%s\n",$a[n]}' <<< "aaaa;bbbb;cccccccc"
ccccccccaaaabbbb

Awk Search file for string

I have implimented a function that searches a column in a file for a string and it works well. What I would like to know how do I modify it to search all the columns fr a string?
awk -v s=$1 -v c=$2 '$c ~ s { print $0 }' $3
Thanks
If "all the columns" means "the entire file" then:
grep $string $file
Here is an example of one way to modify your current script to search for two different strings in two different columns. You can extend it to work for as many as you wish, however for more than a few it would be more efficient to do it another way.
awk -v s1="$1" -v c1="$2" -v s2="$3" -v c2="$4" '$c1 ~ s1 || $c2 ~ s2 { print $0 }' "$5"
As you can see, this technique won't scale well.
Another technique treats the column numbers and strings as a file and should scale better:
awk 'FNR == NR {strings[++c] = $1; columns[c] = $2; next}
{
for (i = 1; i <= c; i++) {
if ($columns[i] ~ strings[i]) {
print
}
}
}' < <(printf '%s %d\n' "${searches[#]}") inputfile
The array ${searches[#]} should contain strings and column numbers alternating.
There are several ways to populate ${searches[#]}. Here's one:
#!/bin/bash
# (this is bash and should precede the AWK above in the script file)
unset searches
for arg in "${#:1:$#-1}"
do
searches+=("$arg")
shift
done
inputfile=$1 # the last remaining argument
# now the AWK stuff goes here
To run the script, you'd do this:
$ ./scriptname foo 3 bar 7 baz 1 filename
awk -v pat="$string" '$0 ~ pat' infile

Resources