How to increment a shell variable in an awk action

How to increment a shell variable in an awk action - linux

My shell script is something like this:
#!/bin/bash
global_var=0
func() {
awk '$1 ~/^pattern/ {global_var=$((global_var+1))}' $1
}
func input_file_name
I want to increment the global (shell) variable global_var inside the awk action. How to do so? Normal shell style incrementing does not seem to be working.

Try this:
func() {
awk '$1~/^pattern/ {++awk_var} END {print awk_var+0}' "$1"
}
shell_var=$(func input_file_name)
The shell and awk are separate worlds, and you should treat them as such(*) (which, in effect, you're already doing, by enclosing your awk program in single quotes, which prevents the shell from expanding any shell variable references in your akw program string).
Thus, use an awk[-internal] variable to perform your counting (awk_var) and output it after having finished processing the input file (in the END block, using print to output the awk variable to stdout - the +0 part is to ensure that the output defaults to 0 in case NO match was found.)
Note that, generally, awk variables need no explicit initialization, because they default to 0 in numerical and Boolean contexts, and to "" (empty string) in string contexts).
Also note that awk has its own syntax, and shell constructs such as $((...)) for arithmetic expansion do not apply. Generally, awk variables are referred to just by name (no $ prefix), and arithmetic operations such as ++ can be applied directly.
Using command substitution - $(...) - in the shell then allows you to capture output from the awk command.
In your specific case you have no need to pass variable values into the awk program, but if you needed to do that, you'd use one or more instances of awk's -v option; e.g.: awk -v awk_var="$shell_var" ...
On the shell (bash) side, if you wanted to add awk's output to the shell variable instead of just assigning it:
declare -i shell_var # make sure variable is an integer
shell_var+=$(func input_file_name) # add function's output to existing value
(*) The shell and awk have completely separate namespaces that have no direct way of interacting with one another: awk has no concept of shell variables, and the shell has no concept of awk variables.
It is technically feasible, but ill-advised to integrate shell variable VALUES into an awk program - by using a double-quoted string to represent the awk program in which you reference shell variable VALUES, which are then expanded by the shell ONCE, BEFORE the string gets passed as a program to awk.
What you CANNOT do is to modify a shell variable from inside an awk program.
Since it gets complicated quickly as to which parts of the awk program are interpreted by the shell up front vs. which parts are interpreted by awk later (where '$ has special meaning too, for instance), the best approach is to:
use a single-quoted string to represent the awk program, so as to protect it from interpretation by the shell
if values need to be passed in, use instances of the -v option
if something needs to be passed out, print to stdout from awk and use command substitution or redirection to capture it via the shell.

Related

linux bash, passing paramenters using a varible issue

I am trying to use a variable to store the parameters, here is the simple test:
#!/bin/bash
sed_args="-e \"s/aaaa/bbbb/g\""
echo $sed_args`
I expected the output to be
-e "s/aaaa/bbbb/g"
but it gives:
"s/aaaa/bbbb/g"
without the "-e"
I am new to bash, any comment is welcome. Thanks, maybe this is already answered somewhere.

You need an array to construct arguments dynamically:
#!/usr/bin/env bash
sed_args=('-e' 's/aaaa/bbbb/g')
echo "${sed_args[#]}"

When you use the variable without double quotes, it gets word split by the shell even before echo sees the value(s). Then, the bash's builtin echo interprets -e as a parameter for itself (which is normally used to turn on interpretation of backslash escapes).
When you double quote the variable, it won't be split and will be interpreted as a single argument to echo:
echo "$sed_args"
For strings you don't control, it's safer to use printf as it doesn't take any arguments after the format string:
printf %s "$string"

"Substitution replacement not terminated" with variable

Found this error in other questions, but I can't see how the solutions relate to this.
Assume a file test containing:
one
twoX
three
I can correct twoX with:
perl -0777 -i -pe 's/twoX/two/igm' test
I can make a function to do this:
replace_str(){ perl -0777 -i -pe 's/'$2'/'$3'/igm' $1; }
replace_str test twoX two
But this fails when the replacement contains a space (possibly other chars):
replace_str test two 'two frogs'
Substitution replacement not terminated at -e line 1.
The perl line works with the space. Why not when called in a function? I've tried with other quotes and e.g. $(echo two frogs) (with and without quotes).

It's because you end the string you pass as argument to Perl for your variable expansions. That makes the regex become two arguments.
Instead just put the whole regex, including variables, inside double-quotes and the shell should expand the variables properly.
So use "s/$2/$3/igm" instead.

Making a bash script to accept input from file OR piping output

I have the following bash script which takes the tabular data as input,
get the first line and spit them vertically:
#!/bin/bash
# my_script.sh
export LC_ALL=C
file=$1
head -n1 $file |
tr "\t" "\n" |
awk '{print $1 " " NR-1}'
The problem is that I can only execute it this way:
$ myscript.sh some_tab_file.txt
What I want to do is on top of the above capability also allows you to do this:
$ cat some_tab_file.txt myscript.sh | myscript.sh
Namely take it from pipe output. How can I achieve that?

I'd normally write:
export LC_ALL=C
head -n1 "$#" |
tr "\t" "\n" |
awk '{print $1 " " NR-1}'
This works with any number of arguments, or none if there are none. Using "$#" is important in this and many other contexts. See the Bash manual on special parameters and shell parameter expansion for more information on the many and varied notations available for controlling how shell parameters are handled. Generally, double quotes are a good idea, especially if the file names may contain spaces.

A common idiom is to fall back to the input file - if there are no parameters. There is a convenient shorthand for that;
file=${1--}
The substitution ${variable-fallback} evaluates to the variable's value, or fallback if it's unset.
I believe your script should work as-is, though; head will read standard input if the (unquoted!) file name you pass in evaluates to the empty string.
Take care to properly double-quote all interpolations of "$file", by the way; otherwise, your script won't work on filenames containing spaces or shell metacharacters. (Then you break the fortunate side effect of not passing a filename to head if your script did not receive one, though.)

Extract all variable values in a shell script

I'm debugging an old shell script; I want to check the values of all the variables used, it's a huge ugly script with approx more than 140 variables used. Is there anyway I can extract the variable names from the script and put them in a convenient pattern like:
#!/bin/sh
if [ ${BLAH} ....
.....
rm -rf ${JUNK}.....
to
echo ${BLAH}
echo ${JUNK}
...

Try running your script as follows:
bash -x ./script.bash
Or enable the setting in the script:
set -x

You can dump all interested variables in one command using:
set | grep -w -e BLAH -e JUNK
To dump all the variables to stdout use:
set
or
env
from inside your script.

You can extract a (sub)list of the variables declared in your script using grep:
grep -Po "([a-z][a-zA-Z0-9_]+)(?==\")" ./script.bash | sort -u
Disclaimer: why "sublist"?
The expression given will match string followed by an egal sign (=) and a double quote ("). So if you don't use syntax such as myvar="my-value" it won't work.
But you got the idea.
grep Options
-P --perl-regexp: Interpret PATTERN as a Perl regular expression (PCRE, see below) (experimental) ;
-o --only-matching: Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
Pattern
I'm using a positive lookahead: (?==\") to require an egal sign followed by a double quote.

In bash, but not sh, compgen -v will list the names of all variables assigned (compare this to set, which has a great deal of output other than variable names, and thus needs to be parsed).
Thus, if you change the top of the script to #!/bin/bash, you will be able to use compgen -v to generate that list.
That said, the person who advised you use set -x did well. Consider this extension on that:
PS4=':$BASH_SOURCE:$LINENO+'; set -x
This will print the source file and line number before every command (or variable assignment) which is executed, so you will have a log not only of which variables are set, but just where in the source each one was assigned. This makes tracking down where each variable is set far easier.

Extracting a string in csh

Would you please explain why the following shell command wouldn't work:
sh-3.1$ echo $MYPATH
/opt/Application/DATA/CROM/my_application
sh-3.1$ awk '{print substr($MYPATH,3)}'
Thanks
Best Regards

MYPATH is not going to be substituted by the shell since the string uses single quotes. Consider the following:
csh$ echo '{print substr($USER,3)}'
{print substr($USER,3)}
csh$ echo "{print substr($USER,3)}"
{print substr(dshawley,3)}
The usage of single quotes instructs the shell to pass the string argument to the program as-is. Double quotes tell the shell to perform variable expansion on the argument before passing it to the program. This is a basic shell feature that is common amongst shells and some programming languages (e.g., perl).
The next problem that you are going to run into is that awk will want quotes around the first parameter to substr or the parse will fail. You will probably see an "Illegal variable name" warning in this case. This is where I get lost with csh since I have no clue how to properly escape a double-quote within a quoted string. In bash/sh/ksh, you would do the following:
sh$ awk "{print substr(\"$USER\",3)}"
input
^D
hawley
sh$
Just in case you do not already know this, awk will require an input stream before it is going to do anything. I had to type "input" and the EOF character for the little example.

Quoting and escaping
"string" is a weak quote. Enclosed whitespace and wildcards are taken as literals, but variable and command substitutions are still performed.
'string' is a strong quote. The entire enclosed string is taken as a literal.
You can use the -v option to pass variable to awk:
awk -v mypath=$MYPATH 'BEGIN{print substr(mypath, 3)}'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to increment a shell variable in an awk action - linux

Related

linux bash, passing paramenters using a varible issue

"Substitution replacement not terminated" with variable

Making a bash script to accept input from file OR piping output

Extract all variable values in a shell script

Extracting a string in csh

Categories

Resources