How should I do if the pattern in awk cmd is a bash variable and contains special character? - linux

Description: The 1-line awk cmd is used to print all lines after the matched line in my shell script as below.
#!/bin/bash
...
awk "f;/${PATTERN}/{f=1}" ${FILE}
Since the ${PATTERN} may contains special character, the cmd will fail in this case.
Q1. How should I handle such kind of situation if regex is used in awk?
Q2. Is it possible to just use the raw string in this cmd instead of regex e.g. /$PATTERN/ to avoid the special character problem?

Close. It's better to pass shell variables in to awk with -v than to place them in the awk script directly.
awk -v pat="${PATTERN}" 'f; $0 ~ pat {f=1}' "${FILE}"
If ${PATTERN} is not a regex, you can use a different operator:
awk -v pat="${PATTERN}" 'f; $0 == pat {f=1}' "${FILE}"
or you can even handle non-regex substrings:
awk -v pat="${PATTERN}" 'f; index($0, pat) {f=1}' "${FILE}"

Related

Is it possible to retrieve one string between 2 special characters from text file using bash?

Let's say I have the following text file
test.txt
ABC_01:Testing-ABCDEFG
If I want to retrieve the string after colon, I will be using
awk -F ":" '/ABC_01/{print $NF}' test.txt
which will return Testing-ABCDEFG
But what should I do if I only want to retrieve the string after the colon and before the hyphen?
You are so close. That is where split() comes in, e.g.
awk -F: '/ABC_01/{ split($NF,arr,"-"); print arr[1] }'
Which will output
Testing
The GNU Awk User's Guide - String Manipulation Functions provides the details on split(). Give it a try and let me know if you have any further questions.
Using Bash's built'in Extended Regex Engine
#!/usr/bin/env bash
while read -r; do
[[ $REPLY =~ :(.*)- ]] || :
echo "${BASH_REMATCH[1]}"
done
Using standard POSIX shell IFS field separators:
#!/usr/bin/env sh
while IFS=':-' read -r _ m _; do
echo "$m"
done
Using (GNU) grep and look-around:
$ grep -oP '(?<=:)[^-]*(?=-)' file
Testing
Explained:
grep GNU grep supports PCRE and look-around
`-o Print only the matched (non-empty) parts of a matching line
-P Interpret PATTERNS as Perl-compatible regular expressions
(?<=:) positive look-behind, ie. preceeded by a colon
[^-]* anything but a hyphen
(?=-) positive look-ahead, ie. followed by a hyphen

Using awk command in Bash

I'm trying to loop an awk command using bash script and I'm having a hard time including a variable within the single quotes for the awk command. I'm thinking I should be doing this completely in awk, but I feel more comfortable with bash right now.
#!/bin/bash
index="1"
while [ $index -le 13 ]
do
awk "'"/^$index/ {print}"'" text.txt
done
Use the standard approach -- -v option of awk to set/pass the variable:
awk -v idx="$index" '$0 ~ "^"idx' text.txt
Here i have set the variable idx as having the value of shell variable $index. Inside awk, i have simply used idx as an awk variable.
$0 ~ "^"idx matches if the record starts with (^) whatever the variable idx contains; if so, print the record.
awk '/'"$index"'/' text.txt
# A lil play with the script part where you split the awk command
# and sandwich the bash variable in between using double quotes
# Note awk prints by default, so idiomatic awk omits the '{print}' too.
should do, alternatively use grep like
grep "$index" text.txt # Mind the double quotes
Note : -le is used for comparing numerals, so you may change index="1" to index=1.

shell scripts variable passed to awk and double quotes needed to preserve

I have some logs called ts.log that look like
[957670][DEBUG:2016-11-30 16:49:17,968:com.ibatis.common.logging.log4j.Log4jImpl.debug(Log4jImpl.java:26)]{pstm-9805256} Parameters: []
[957670][DEBUG:2016-11-30 16:49:17,968:com.ibatis.common.logging.log4j.Log4jImpl.debug(Log4jImpl.java:26)]{pstm-9805256} Types: []
[957670][DEBUG:2016-11-30 16:50:17,969:com.ibatis.common.logging.log4j.Log4jImpl.debug(Log4jImpl.java:26)]{rset-9805257} ResultSet
[957670][DEBUG:2016-11-30 16:51:17,969:com.ibatis.common.logging.log4j.Log4jImpl.debug(Log4jImpl.java:26)]{rset-9805257} Header: [LAST_INSERT_ID()]
[957670][DEBUG:2016-11-30 16:52:17,969:com.ibatis.common.logging.log4j.Log4jImpl.debug(Log4jImpl.java:26)]{rset-9805257} Result: [731747]
[065417][DEBUG:2016-11-30 16:53:17,986:sdk.protocol.process.InitProcessor.process(InitProcessor.java:61)]query String=requestid=10547
I have a script in which there's sth like
#!/bin/bash
begin=$1
cat ts.log | awk -F '[ ,]' '{if($2 ~/^[0-2][0-9]:[0-6][0-9]:[0-6][0-9]&& $2>="16:50:17"){print $0}}'
instead of inputting the time like 16:50:17 I want to just pass $1 of shell to awk so that all I need to do is ./script time:hh:mm:ss The script will look like
#!/bin/bash
begin=$1
cat ts.log | awk -v var=$begin -F '[ ,]' '{if($2 ~/^[0-2][0-9]:[0-6][0-9]:[0-6][0-9]&& $2>="var"){print $0}}'
But the double quotes need to be there OR it won't work.
I tried 2>"\""var"\""
but it doesn't work.
so is there a way to keep the double quotes there?
preferred result ./script
then extract the log from the time specified as $1.
There's many ways to do what you want.
Option 1: Using double quotes enclosing awk program
#!/bin/bash
begin=$1
awk -F '[ ,]' "\$2 ~ /^..:..:../ && \$2 >= \"${begin}\" " ts.log
Inside double quotes strings, bash does variable substitution. So $begin or ${begin} will be replaced with the shell variable value (whatever sent by the user)
Undesired effect: awk special variables starting with $ must be escaped with '\' or bash will try to replace them before execute awk.
To get a double quote char (") in bash double quote strings, it has to be escaped with '\', so in bash " \"16:50\" " will be replaced with "16:50". (This won't work with single quote strings, that don't have expansion of variables nor escaped chars at all in bash).
To see what variable substitutions are made when bash executes the script, you can execute it with debug option (it's very enlightening):
$ bash -x yourscript.sh 16:50
Option 2: Using awk variables
#!/bin/bash
begin=$1
awk -F '[ ,]' -v begin=$begin '$2 ~ /^..:..:../ && $2 >= begin' ts.log
Here an awk variable begin is created with option -v varname=value.
Awk variables can be used in any place of awk program as any other awk variable (don't need double quotes nor $).
There are other options, but I think you can work with these two.
In both options I've changed a bit your script:
It doesn't need cat to send data to awk, because awk can execute your program in one or more data files sent as parameters after your program.
Your awk program doesn't need include print at all (as #fedorqui said), because a basic awk program is composed by pairs of pattern {code}, where pattern is the same as you used in the if sentence, and the default code is {print $0}.
I've also changed the time pattern, primarly to clarify the script, but in a log file there's almost no chance that exists some 8 char length string that has 2 colons inside (regexp: . repaces any char)

Bash script: find and replace uppercase character on a string

Saying we have a string like:
doSomething()
and we want to obtain:
do_something()
What is the best way to do this?
I've read documentation about strings manipulation but I can't find the right command combination..
update
After #anubhava discussion, I find solution installing gnu-sed:
brew install gnu-sed
And then I can run script in this way:
s="doSomethingElse()"; gsed 's/[[:upper:]]/_\L&/g' <<< "$s"
output: do_something_else()
Using gnu-sed you can do:
s='doSomethingElse()'
sed 's/[[:upper:]]/_\L&/g' <<< "$s"
do_something_else()
Or else with non-gnu-sed (BSD) pipe with tr:
sed 's/[[:upper:]]/_&/g' <<< "$s" | tr [[:upper:]] [[:lower:]]
do_something_else()
Or using perl:
perl -pe 's/[[:upper:]]/_\L$&/g' <<< "$s"
do_something_else()
Or using gnu-awk:
awk -v RS=[[:upper:]] -v ORS= '1; RT{printf "_%s", tolower(RT)}' <<< "$s"
do_something_else()
In bash 4:
$ s="doSomethingElse()"
$ while [[ $s =~ [[:lower:]]([[:upper:]]) ]]; do
> s=${s/[[:upper:]]/_${BASH_REMATCH[1],,}}
> done
$ echo "$s"
do_something_else()
First, the while loop tries to match a lowercase character immediately followed by an uppercase character, and capturing the matched uppercase character. The parameter expansion replaces the first uppercase character in the string with an underscore and the captured uppercase character (converted to lowercase by the ,, operator). The processes repeats until no more lower/upper pairs are found.
If bash allowed capture groups in patterns, something hypothetical like
s=${s//([[:lower:]])([[:upper:]])/${BASH_PATMATCH[1]}_${BASH_PATMATCH[2],,}}
could work without a loop. As is, we need the extra step of using regular expression matches one match at a time to capture the letter to be lowercased.

Split a string and pick the uppercase substring

Consider the following example variables in bash:
PET="cat/DOG/hamster"
FOOD="soup/soup/PIZZA"
SUBJECT="MATH/physics/biology"
How can I split any of those strings by a slash, extract the part that's all uppercase and store it in a variable? For example, how would I take DOG out of the $PET variable and store it in an $OPTION variable?
I need a portable solution that works under bash and zsh specifically.
You could use tr to remove all characters that are not uppercase:
OPTION=$(tr -dc '[:upper:]' <<< $PET)
Note that here-strings (<<< $VARIABLE) are a bash-ism. In other shells you'll have to echo the variable into tr:
OPTION=$(echo "$PET" | tr -dc '[:upper:]')
It sounds like there is only one portion of the string is in uppercase, so you can ignore the splitting portion of the question. This should work in both zsh and bash (although it is not portable in the sense of POSIX compatibility):
$ echo "${PET//[^A-Z]}"
DOG
You can try something like this -
OPTION=$(gawk -F'/' '{for (i=1;i<=NF;i++) if ($i ~ /\<[A-Z]+\>/) print $i}' <<< $PET)
If you like a pure bash solution then you can add following piece of code
#!/bin/bash
PET="cat/DOG/hamster"
IFSBK=$IFS
IFS='/'
for word in $PET; do
if [[ $word =~ [A-Z]+ ]]; then
OPTION="$word"
fi
done
IFS=$IFSBK

Resources