unix - awk unexpected behaviour

unix - awk unexpected behaviour - linux

I have the below code in a bash file called 'findError.sh':
#!/bin/bash
filename="$1"
formatindicator="\"|\""
echo "$formatindicator"
formatarg="\$1"
echo "$formatarg"
count=`awk -F$formatindicator '{print $formatarg}' $filename | perl -ane '{ if(m/ERROR/) { print } }' | wc -l `
command="awk -F$formatindicator '{print $formatarg}' $filename | perl -ane '{ if(m/ERROR/) { print } }' | wc -l"
echo $command
echo $count
I then run it at the command line like so:
sh findError.sh test.dat
But It gives me a different count than running the command that was echoed?? How is this possible?
ie
The $command that is echoed back is:
awk -F"|" '{print $1}' test.dat | perl -ane '{ if(m/ERROR/) { print } }' | wc -l
But the $count that is echoed back is:
3
However if I just run this one line below at the command line (not through the script) - the result is 0:
awk -F"|" '{print $1}' test.dat | perl -ane '{ if(m/ERROR/) { print } }' | wc -l
Sample input file (test.dat):
sid|storeNo|latitude|longitude
2|1|-28.03720000
9|2
10
jgn352|1|-28.03ERROR720000
9|2|fdERRORkjhn422-405
0000543210|gfERRORdjk39
Notes: Using SunOS with bash version 4.0.17

You're being too careful with your quotes around the format delimiter.
When you type:
awk -F"|" ...
The program (awk) sees -F| as its first argument; the shell strips the double quotes.
When you have:
formatindicator="\"|\""
echo "$formatindicator"
formatarg="\$1"
echo "$formatarg"
count=`awk -F$formatindicator ...`
You have preserved the double quotes in $formatindicator and therefore awk sees -F"|" as the delimiter, and uses the double quote as the delimiter.
Use:
formatindicator="|"
echo "$formatindicator"
formatarg="\$1"
echo "$formatarg"
count=`awk -F"$formatindicator" ...`
The difference is that the shell strips the quotes off -F"$formatindicator" but doesn't do that when $formatindicator itself contains the double quotes.
(NB: edited to retain back-quotes instead of the $(...) notation that is (a) preferred and (b) was used in the first version of this answer. The $(...) notation was not recognized by the SunOS /bin/sh which was, I believe, being used to execute the script. Both bash and ksh recognize the $(...) notation, but the basic Bourne shell, /bin/sh, on Solaris 10 (SunOS 5.10) and earlier (I've not laid hands on Solaris 11) does not recognize $(...).)
I note that any of perl, awk or grep could be used to find the count of the error lines on its own, so the triplet of awk piped to perl piped to wc is not very efficient.
awk -F"|" '$1 ~ /ERROR/ { count++ } END { print count }' $filename
grep -c ERROR $filename # simple
grep -c '^[^|]*ERROR[^|]*|' $filename # accurate
perl -anF"|" -e '$count++ if $F[0] =~ m/ERROR/; END { print "$count\n"; }' $filename
It's Perl, so TMTOWTDI; take your pick...
Side discussion
In the comments, we have concerns over how various parts of the script are being interpreted.
formatindicator="|"
formatarg="\$1"
count=`awk -F$formatindicator '{print $formatarg}' $filename | perl -ane '{ if(m/ERROR/) { print } }' | wc -l `
Let's simplify this to (using part of my main answer):
count=`awk -F"$formatindicator" '{print $formatarg}' $filename`
The intention is to have the delimiter specified on the command line (which happens successfully) via the -F option. The issue, I expect, is "why does $formatarg get expanded inside single quotes?". And the answer is "Does it?". I think not. So, what is happening, is that awk is seeing the script {print $formatarg}. Since formatarg is not assigned any value, it is equivalent to 0, so the script prints $0, which is the entire input line. Perl is quite happy to echo the line if it matches ERROR anywhere on the line, and wc couldn't care less about what's in the lines, so the result is approximately what was expected. The only time there'd be a discrepancy is when the line from $filename contains ERROR in other than the first pipe-delimited field. That would be counted by the script where it should not.

The problem is with using external variables in awk. If you wish to use external variables in awk then define a variable in the awk one-liner using -v option and variable name and assign your external variable to it. So
The line -
count=`awk -F$formatindicator '{print $formatarg}' $filename | perl -ane '{ if(m/ERROR/) { print } }' | wc -l `
should be -
count=`awk -v fi="$formatindicator" -v fa="$formatarg" 'BEGIN {FS=fi}{print fa}' "$1" | perl -ane '{ if(m/ERROR/) { print } }' | wc -l `
Update:
As stated in the comments, the $formatarg contains the value $1. What you need to do is store just 1 and then pass it as -
count=`awk -v fi=$formatindicator -v fa="$formatarg" 'BEGIN {FS=fi}{print $fa}' "$1" | perl -ane '{ if(m/ERROR/) { print } }' | wc -l
[jaypal:~/Temp] echo $formatindicator
|
[jaypal:~/Temp] echo $formatarg
1
[jaypal:~/Temp] awk -v fi="$formatindicator" -v fa="$formatarg" 'BEGIN {FS=fi}{print $fa}' data.file
sid
2
9
10
jgn352
9
0000543210
Script:
#!/bin/bash
filename="$1"
formatindicator="|"
echo "$formatindicator"
formatarg="1"
echo "$formatarg"
count=`awk -v fa="$formatarg" -v fi="$formatindicator" 'BEGIN{FS=fi}{print $fa}' $filename | perl -ane '{ if(m/ERROR/) { print } }' | wc -l `
command="awk -F$formatindicator '{print $formatarg}' $filename | perl -ane '{ if(m/ERROR/) { print } }' | wc -l"
echo $command
echo $count

Related

Increment variable when matched awk from tail

I'm monitoring from an actively written to file:
My current solution is:
ws_trans=0
sc_trans=0
tail -F /var/log/file.log | \
while read LINE
echo $LINE | grep -q -e "enterpriseID:"
if [ $? = 0 ]
then
((ws_trans++))
fi
echo $LINE | grep -q -e "sc_ID:"
if [ $? = 0 ]
then
((sc_trans++))
fi
printf "\r WSTRANS: $ws_trans \t\t SCTRANS: $sc_trans"
done
However when attempting to do this with AWK I don't get the output - the $ws_trans and $sc_trans remains 0
ws_trans=0
sc_trans=0
tail -F /var/log/file.log | \
while read LINE
echo $LINE | awk '/enterpriseID:/ {++ws_trans} END {print | ws_trans}'
echo $LINE | awk '/sc_ID:/ {++sc_trans} END {print | sc_trans}'
printf "\r WSTRANS: $ws_trans \t\t SCTRANS: $sc_trans"
done
Attempting to do this to reduce load. I understand that AWK doesn't deal with bash variables, and it can get quite confusing, but the only reference I found is a non tail application of AWK.
How can I assign the AWK Variable to the bash ws_trans and sc_trans? Is there a better solution? (There are other search terms being monitored.)

You need to pass the variables using the option -v, for example:
$ var=0
$ printf %d\\n {1..10} | awk -v awk_var=${var} '{++awk_var} {print awk_var}'
To set the variable "back" you could use declare, for example:
$ declare $(printf %d\\n {1..10} | awk -v awk_var=${var} '{++awk_var} END {print "var=" awk_var}')
$ echo $var
$ 10
Your script could be rewritten like this:
ws_trans=0
sc_trans=0
tail -F /var/log/system.log |
while read LINE
do
declare $(echo $LINE | awk -v ws=${ws_trans} '/enterpriseID:/ {++ws} END {print "ws_trans="ws}')
declare $(echo $LINE | awk -v sc=${sc_trans} '/sc_ID:/ {++sc} END {print "sc_trans="sc}')
printf "\r WSTRANS: $ws_trans \t\t SCTRANS: $sc_trans"
done

Optimize Multiline Pipe to Awk in Bash Function

I have this function:
field_get() {
while read data; do
echo $data | awk -F ';' -v number=$1 '{print $number}'
done
}
which can be used like this:
cat filename | field_get 1
in order to extract the first field from some piped in input. This works but I'm iterating on each line and it's slower than expected.
Does anybody know how to avoid this iteration?
I tried to use:
stdin=$(cat)
echo $stdin | awk -F ';' -v number=$1 '{print $number}'
but the line breaks get lost and it treats all the stdin as a single line.
IMPORTANT: I need to pipe in the input because in general I DO NOT have just to cat a file. Assume that the file is multiline, the problem is that actually. I know I can use "awk something filename" but that won't help me.

Just lose the while. Awk is a while loop in itself:
field_get() {
awk -F ';' -v number=$1 '{print $number}'
}
$ echo 1\;2\;3 | field_get 2
2
Update:
Not sure what you mean by your comment on multiline pipe and file but:
$ cat foo
1;2
3;4
$ cat foo | field_get 1
1
3

Use either stdin or file
field_get() {
awk -F ';' -v number="$1" '{print $number}' "${2:-/dev/stdin}"
}
Test Results:
$ field_get() {
awk -F ';' -v number="$1" '{print $number}' "${2:-/dev/stdin}"
}
$ echo '1;2;3;4' >testfile
$ field_get 3 testfile
3
$ echo '1;2;3;4' | field_get 2
2

No need to use a while loop and then awk. awk itself can read the input file. Where $1 is the argument passed to your script.
cat script.ksh
awk -v field="$1" '{print $field}' Input_file
./script.ksh 1

This is a job for the cut command:
cut -d';' -f1 somefile

awk - send sum to global variable

I have a line in a bash script that calculates the sum of unique IP requests to a certain page.
grep $YESTERDAY $ACCESSLOG | grep "$1" | awk -F" - " '{print $1}' | sort | uniq -c | awk '{sum += 1; print } END { print " ", sum, "total"}'
I am trying to get the value of sum to a variable outside the awk statement so I can compare pages to each other. So far I have tried various combinations of something like this:
unique_sum=0
grep $YESTERDAY $ACCESSLOG | grep "$1" | awk -F" - " '{print $1}' | sort | uniq -c | awk '{sum += 1; print ; $unique_sum=sum} END { print " ", sum, "total"}'
echo "${unique_sum}"
This results in an echo of "0". I've tried placing __$unique_sum=sum__ in the END, various combinations of initializing the variable (awk -v unique_sum=0 ...) and placing the variable assignment outside of the quoted sections.
So far, my Google-fu is failing horribly as most people just send the whole of the output to a variable. In this example, many lines are printed (one for each IP) in addition to the total. Failing a way to capture the 'sum' variable, is there a way to capture that last line of output?
This is probably one of the most sophisticated things I've tried in awk so my confidence that I've done anything useful is pretty low. Any help will be greatly appreciated!

You can't assign a shell variable inside an awk program. In general, no child process can alter the environment of its parent. You have to have the awk program print out the calculated value, and then shell can grab that value and assign it to a variable:
output=$( grep $YESTERDAY $ACCESSLOG | grep "$1" | awk -F" - " '{print $1}' | sort | uniq -c | awk '{sum += 1; print } END {print sum}' )
unique_sum=$( sed -n '$p' <<< "$output" ) # grab the last line of the output
sed '$d' <<< "$output" # print the output except for the last line
echo " $unique_sum total"
That pipeline can be simplified quite a lot: awk can do what grep can do, so first
grep $YESTERDAY $ACCESSLOG | grep "$1" | awk -F" - " '{print $1}'
is (longer, but only one process)
awk -F" - " -v date="$YESTERDAY" -v patt="$1" '$0 ~ date && $0 ~ patt {print $1}' "$ACCESSLOG"
And the last awk program just counts how many lines and can be replaced with wc -l
All together:
unique_output=$(
awk -F" - " -v date="$YESTERDAY" -v patt="$1" '
$0 ~ date && $0 ~ patt {print $1}
' "$ACCESSLOG" | sort | uniq -c
)
echo "$unique_output"
unique_sum=$( wc -l <<< "$unique_output" )
echo " $unique_sum total"

can not use unix $variable in Fixed search of awk command

I can not use unix $variable in Fiexd search of awk command.
Please see below my commands.
a="NEW_TABLES NEW_INSERT"
b="NEW"
echo $a | awk -v myvar=$b -F'$0~myvar' '{print $2}'
is not returning any output
but if manually enter the $b value there , its working as below
echo $a | awk -v -F'NEW' '{print $2}'
outputs:
TABLES NEW_INSERT

This should make it:
$ a="NEW_TABLES NEW_INSERT"
$ echo $a | awk -F"NEW_" '{print $2}'
TABLES
$ b="NEW_"
$ echo $a | awk -F"$b" '{print $2}'
TABLES

Your quotings are all messed up and you can use your variable to split the line using split function:
a="NEW_TABLES NEW_INSERT"
b="NEW"
echo $a | awk -v myvar="$b" '{split($0,ary,myvar);print ary[2]}'
Outputs:
_TABLES

how to split "1$$$$" use awk

if I have a string like "sn":"1$$$$12056597.3,2595585.69$$", how can I use awk to split "1$$$$"
I tried
**cat $filename | awk -F "\"1\$\$\$\$" '{ print $2 }'**
**cat $filename | awk -F "\"1$$$$" '{ print $2 }'**
but all failed

any number of $ use
echo '"1$$$$12056597.3,2595585.69$$"' | awk -F '"1[$]+' '{ print $2 }'
exactly 4 use
echo '"1$$$$12056597.3,2595585.69$$"' | awk -F '"1[$]{4}' '{ print $2 }'
to help debug problems with escape characters in the shell you can use the built-in shell command set which will print the arguments that are being passed to awk after the shell has interpreted any escape characters and replaced shell variables
In this case the shell first interprets \$ as an escape for a plain $
set -x
echo '"1$$$$12056597.3,2595585.69$$"'|awk -F "\"1\$\$\$\$" '{ print $2 }'
+ echo '"1$$$$12056597.3,2595585.69$$"'
+ awk -F '"1$$$$' '{ print $2 }'
You can use \$ so the \$ get to awk, but \$ is interpreted in awk regular expressions as a $ anyway. At least awk is nice enough to warn you...
echo '"1$$$$12056597.3,2595585.69$$"'|awk -F "\"1\\$\\$\\$\\$" '{ print $2 }'
+ echo '"1$$$$12056597.3,2595585.69$$"'
+ awk -F '"1\$\$\$\$' '{ print $2 }'
awk: warning: escape sequence `\$' treated as plain `$'
Turn off debugging with
set +x

echo '"1$$$$12056597.3,2595585.69$$"' | awk -F '"1[$]+' '{ print $2 }' |sed 's/.\{3\}$//'
Or if you want to split both float digit:
echo '"1$$$$12056597.3,2595585.69$$"' | awk -F '"1[$]+' '{ print $2 }' |sed 's/.\{3\}$//' |awk 'BEGIN {FS=","};{print $1}'
And
echo '"1$$$$12056597.3,2595585.69$$"' | awk -F '"1[$]+' '{ print $2 }' |sed 's/.\{3\}$//' |awk 'BEGIN {FS=","};{print $2}'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

unix - awk unexpected behaviour - linux

Related

Increment variable when matched awk from tail

Optimize Multiline Pipe to Awk in Bash Function

awk - send sum to global variable

can not use unix $variable in Fixed search of awk command

how to split "1$$$$" use awk

Categories

Resources