Bash String Format Comparison with Wildcards

Bash String Format Comparison with Wildcards - string

I am fairly new to bash scripting and was trying to echo only lines that match a specific formatting. I have this code so far:
LINE=1
while read -r CURRENT_LINE
do
if [[ $CURRENT_LINE == ??-?-??? ]]
then
echo "$LINE: $CURRENT_LINE"
fi
((LINE++))
done < "./new-1.txt"
The text file contains number sequences on each line that match the following format: "12-3-456", but also contains sequences that are in different formats as well, such as "123-89203-9420" or "123-456-7890". I can't quite understand why the if statement inside the while loop does not result to True on lines that match the formatting. I've tried using the * as well, but using it gives me incorrect results.
Here are the contents of the text file new-1.txt. I want the script to output "Line 1: 11-1-111", but it doesn't output anything.
11-1-111
222-22-2222
333-33-3333
444-444-4444
555-555-5555

In the regex parlance, the ? makes the character or selection optional, ie , a character/selection is allowed to occur at most one time but zero occurrences are also tolerated.
However, the == operation is not the regex matching operator. It is =~.
So changing your if clause to the below would do the job.
[[ $CURRENT_LINE =~ "^[0-9]{2}-[0-9]{1}-[0-9]{3}$" ]]
Here
The ^ specifies the beginning of regex and $ the end. So we have a tight coupling of the pattern to match
[0-9] denotes a range, here any number from zero to nine.
The {n} mandates that the preceding character/selection should match exactly n number of times
Note : You can also use a more verbose [[:digit:]] instead of [0-9]

Related

How to return only integers from a variable in Shell Script and discard letters and leading zeros?

In my shell script there is a parameter that comes from certain systems and it gives an answer similar to this one: PAR0000008.
And I need to send only the last number of this parameter to another variable, ie VAR=8.
I used the command VAR=$( echo ${PAR} | cut -c 10 ) and it worked perfectly.
The problem is when the PAR parameter returns with numbers from two decimal places like PAR0000012. I need to discard the leading zeros and send only the number 12 to the variable, but I don't know how to do the logic in the Shell to discard all the characters to the left of the number.

Edit Using grep To Handle 0 As Part Of Final Number
Since you are using POSIX shell, making use of a utility like sed or grep (or cut) makes sense. grep is quite a bit more flexible in parsing the string allowing a REGEX match to handle the job. Say your variable v=PAR0312012 and you want the result r=312012. You can use a command substitution (e.g. $(...)) to parse the value assigning the result to r, e.g.
v=PAR0312012
r=$(echo $v | grep -Eo '[1-9].*$')
echo $r
The grep expression is:
-Eo - use Extended REGEX and only return matching portion of string,
[1-9].*$ - from the first character in [1-9] return the remainder of the string.
This will work for PAR0000012 or PAR0312012 (with result 312012).
Result
For PAR0312012
312012
Another Solution Using expr
If your variable can have zeros as part of the final number portion, then you must find the index where the first [1-9] character occurs, and then assign the substring beginning at that index to your result variable.
POSIX shell provides expr which provides a set of string parsing tools that can to this. The needed commands are:
expr index string charlist
and
expr substr string start end
Where start and end are the beginning and ending indexes to extract from the string. end just has to be long enough to encompass the entire substring, so you can just use the total length of your string, e.g.
v=PAR0312012
ndx=$(expr index "$v" "123456789")
r=$(expr substr "$v" "$ndx" 10)
echo $r
Result
312012
This will handle 0 anywhere after the first [1-9].
(note: the old expr ... isn't the fastest way of handling this, but if you are only concerned with a few tens of thousands of values, it will work fine. A billion numbers and another method will likely be needed)

This can be done easily using Parameter Expension.
var='PAR0000008'
echo "${var##*0}"
//prints 8
echo "${var##*[^1-9]}"
//prints 8
var="${var##*0}"
echo "$var"
//prints 8
var='PAR0000012'
echo "${var##*0}"
//prints 12
echo "${var##*[^1-9]}"
//prints 12
var="${var##*[^1-9]}"
echo "$var"
//prints 12

Show rows of a file which have a regular expression more than 'n' number of times

I have file- abc.txt, in below format-
a:,b:,c:,d:,e:,f:,g:
a:0;b:,c:3,d:,e:,f:,g:1
a:9,b:8,c:6,d:5,e:2,f:,g:
a:0;b:,c:2,d:1,e:,f:,g:
Now in unix, I want to get only those rows where this regular expression :[0-9] (colon followed by any number) exists more than 2 times.
Or in other words show rows where at least 3 attributes have numerical values present.
Output should be only 2nd and 3rd row
a:0;b:,c:3,d:,e:,f:,g:1
a:9,b:8,c:6,d:5,e:2,f:,g:

With basic grep:
grep '\(:[[:digit:]].*\)\{3,\}' file
:[[:digit:]].* matches a colon followed by a digit and zero or more arbitrary characters. This expressions is put into a sub pattern: \(...\). The expression \{3,\} means that the previous expression has to occur 3 or more times.
With extended posix regular expressions this can be written a little simpler, without the need to escape ( and {:
grep -E '(:[[:digit:]].*){3,}' file

$ awk -F':[0-9]' 'NF>3' file
a:0;b:,c:3,d:,e:,f:,g:1
a:9,b:8,c:6,d:5,e:2,f:,g:
a:0;b:,c:2,d:1,e:,f:,g:

perl -nE '/:[0-9](?{$count++})(?!)/; print if $count > 2; $count=0' input

perl -ne 'print if /(.*?\:\d.*?){2,}/' yourfile
This matches rows having character:number twice or more times.
https://regex101.com/r/tRWtbY/1

Bash: extract a part of a string, after a number

I have a few strings like this:
var1="string one=3423423 and something which i don't care"
var2="another bigger string=413145 and something which i don't care"
var3="the longest string ever=23442 and something which i don't care"
These strings are the output of a python script (which i am not allowed to touch), and I need a way to extract the 1st part of the string, right after the number. Basically, my outputs should be:
"string one=3423423"
"another bigger string=413145"
"the longest string ever=23442"
As you can see, i can't use positions, or stuff like that, because the number and the string length are not always the same. I assume i would need to use a regex or something, but i don't really understand regexes. Can you please help with a command or something which can do this?

grep -oP '^.*?=\d+' inputfile
string one=3423423
another bigger string=413145
the longest string ever=23442
Here -o flag will enable grep to print only matching part and -p will enable perl regex in grep. Here \d+ means one or more digit. So, ^.*?=\d+ means print from start of the line till you find last digit (first match).

You could use parameter expansion, for example:
var1="string one=3423423 and something which i don't care"
name=${var1%%=*}
value=${var1#*=}
value=${value%%[^0-9]*}
echo "$name=$value"
# prints: string one=3423423
Explanation of ${var1%%=*}:
%% - remove the longest matching suffix
= - match =
* - match everything
Explanation of ${var1#*=}:
# - remove the shortest matching prefix
* - match everything
= - match =
Explanation of ${value%%[^0-9]*}:
%% - remove the longest matching suffix
[^0-9] - match any non-digit
* - match everything
To perform the same thing on more than one values easily,
you could wrap this logic into a function:
extract_and_print() {
local input=$1
local name=${input%%=*}
local value=${input#*=}
value=${value%%[^0-9]*}
echo "$name=$value"
}
extract_and_print "$var1"
extract_and_print "$var2"
extract_and_print "$var3"

$ shopt -s extglob
$ echo "${var1%%+([^0-9])}"
string one=3423423
$ echo "${var2%%+([^0-9])}"
another bigger string=413145
$ echo "${var3%%+([^0-9])}"
the longest string ever=23442
+([^0-9]) is an extended pattern that matches one or more non-digits.
${var%%+([^0-9])} with %%pattern will remove the longest match of that pattern from the end of the variable value.
Refs: patterns, parameter substitution

How can I split a string at a keyword in zshell and save the result?

I'm new to the zshell and trying to split a string using a keyword as the delimiter. The output is from netfilter and not always at a fixed position so I need to split at the keywords I'm interested in.
I've found a way that works, but seems like there should be a much simpler way to do it. Any thoughts?
line="[Thu Jul 23 12:29:50 2015] IN=eth0 OUT= SRC=10.1.1.17 DST=10.101.11.1 PROTO=TCP SPT=46286 DPT=1113 SYN URGP=0 "
# this returns a substring starting from 'SRC=' to the end
tmp=${(MS)line##SRC=*}
# use the first element returned in the substring
src=$tmp[(w)1]
echo "src is $src"

To parse a single keyword, I'd use a regular expression match with the =~ conditional operator.
if [[ $line =~ [[:space:]]SRC=[^[:space:]]+ ]]; then
echo src is $MATCH[6,$#MATCH]
else
echo >&2 No SRC=
fi
To parse multiple keywords, I'd split the string minus the timestamp at whitespace using parameter expansion constructs and store the output in an associative array.
timestamp=${${line%%\]*}##*\[}
typeset -A info
for x in ${=line#*\]}; do
if [[ $x = *=* ]]; then
info[${x%%=*}]=${x#*=}
else
info[$x]=
fi
done
echo src is $info[SRC]

You could almost turn line into an associative array, but the key=value items seem too irregular since sometimes the = is missing and sometimes there’s no value. So one way to go is simply splitting the whole line on spaces and putting it into elements of an array. It’s not clear how order dependent the output is, but if you can count on desired keys existing and being in consistent order, one approach would be:
ary=( ${(s. .)line} ) # split on spaces, storing into array
print $ary[8]
SRC=10.1.1.17
Now you can get at any key-value by index. You could also use this array as a starting point to a true associative array.
You might want to remove the datestamp stuff ([...]) first.

How can I split very long case patterns across multiple lines?

How could I split the long valueX string in the following bash code?
case "$1" in
value1|value2|value3|...........more values..................| valueN)
some_processing "$#"
;;
...
esac
I'm looking for splitting into separate lines.
M.b. like:
VAL+=value1
VAL+=value2
....

From the man page:
A case command first expands word, and tries to match it against each pattern in turn, using the same matching rules as for path‐name expansion[.]
In other words, it's a glob pattern, not a regular expression. As such, you can use IFS between pattern tokens. For example:
case "$1" in
value1 | \
value2 )
:
;;
esac
Note that you must escape the line continuation with a backslash, unlike the usual case where the pipe symbol will continue the line automatically. Other than that, you can break up the line the same way you would at the prompt.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Bash String Format Comparison with Wildcards - string

Related

How to return only integers from a variable in Shell Script and discard letters and leading zeros?

Show rows of a file which have a regular expression more than 'n' number of times

Bash: extract a part of a string, after a number

How can I split a string at a keyword in zshell and save the result?

How can I split very long case patterns across multiple lines?

Categories

Resources