Linux input pattern matching [duplicate]

Linux input pattern matching [duplicate] - linux

String:
name#gmail.com
Checking for:
#
.com
My code
if [[ $word =~ "#" ]]
then
if [[ $word =~ ".com" || $word =~ ".ca" ]]
My problem
name#.com
The above example gets passed, which is not what I want. How do I check for characters (1 or more) between "#" and ".com"?

You can use a very very basic regex:
[[ $var =~ ^[a-z]+#[a-z]+\.[a-z]+$ ]]
It looks for a string being exactly like this:
at least one a-z char
#
at least one a-z char
.
at least one a-z char
It can get as complicated as you want, see for example Email check regular expression with bash script.
See in action
$ var="a#b.com"
$ [[ $var =~ ^[a-z]+#[a-z]+\.[a-z]+$ ]] && echo "kind of valid email"
kind of valid email
$ var="a#.com"
$ [[ $var =~ ^[a-z]+#[a-z]+\.[a-z]+$ ]] && echo "kind of valid email"
$

why not go for other tools like perl:
> echo "x#gmail.com" | perl -lne 'print $1 if(/#(.*?)\.com/)'
gmail

The glob pattern would be: [[ $word == ?*#?*.#(com|ca) ]]
? matches any single character and * matches zero or more characters
#(p1|p2|p3|...) is an extended globbing pattern that matches one of the given patterns. This requires:
shopt -s extglob
testing:
$ for word in #.com #a.ca a#.com a#b.ca a#b.org; do
echo -ne "$word\t"
[[ $word == ?*#?*.#(com|ca) ]] && echo matches || echo does not match
done
#.com does not match
#a.ca does not match
a#.com does not match
a#b.ca matches
a#b.org does not match

Related

Bash extract substring from a generic string

I have string such as
username/ticket-12345/feature
and i want to extract just
ticket-12345
from bash. the forrmat of this string could be anything.... e.g.
'my string ticket-12345'
and 'ticket' could be a mixture of lower case and upper case.
Is this possible to do from bash? I've tried searching for this particular case but i can't seem to find an answer...

Here is a pure bash regex method:
re='[[:alpha:]]+-[0-9]+'
s='username/ticket-12345/feature'
[[ $s =~ $re ]] && echo "${BASH_REMATCH[0]}"
ticket-12345
s='my string ticket-12345'
[[ $s =~ $re ]] && echo "${BASH_REMATCH[0]}"
ticket-12345

The shell's built-in ERE (extended regular expression) support is adequate to the task:
ticket_re='[Tt][Ii][Cc][Kk][Ee][Tt]-[[:digit:]]+'
string='my string ticket-12345'
[[ $string =~ $ticket_re ]] && echo "Found ticket: ${BASH_REMATCH[0]}"

With the -o flag grep and its friends display only the found matches. You can use
egrep -io 'ticket-[0-9]+' file.txt
to find the tickets from your input text.

How do I test if a variable is a string in bash?

I tried the following but without success
[root#OBAMA~]# bash
[root#OBAMA~]# a=HelloWorld
[root#OBAMA~]# [[ $a == [A-Za-z] ]] && echo "YES ITS STRING"
( the command not prints anything )
[root#OBAMA~]# [[ $a == [A-Z][a-z] ]] && echo "YES ITS STRING"
( the command not prints anything )

Change your command lke below.
$ [[ $a =~ [A-Za-z]+ ]] && echo "YES ITS STRING"
YES ITS STRING
Use =~ operator to test an input string against a regex.
Add + next to the character class, so that it would repeat the previous pattern or token one or more times. Here it's unnecessary.
Add anchors , in-order to do an exact string match. [[ $a =~ [A-Za-z] ]] && echo "YES ITS STRING" alone will print the string YES ITS STRING because the variable a contains atleast an alphabet.
$ a="HelloWorld"
$ [[ $a =~ ^[A-Za-z]+$ ]] && echo "YES ITS STRING"
YES ITS STRING
$ a="Hello World"
$ [[ $a =~ ^[A-Za-z]+$ ]] && echo "YES ITS STRING"
$

how do you define "a string"
[[ -n $a ]] && echo variable a is not empty
[[ $a == *[[:alpha:]]* ]] && echo variable a contains a letter
shopt -s extglob failglob
[[ $a == +([[:alpha:]]) ]] && echo variable a only has letters
Your glob expressions are not matching because your checking that your variable contains only 1 character or 2 characters.

How to check if a string is a substring of another?

I have the following strings in bash
str1="any string"
str2="any"
I want to check if str2 is a substring of str1
I can do it in this way:
c=`echo $str1 | grep $str2`
if [ $c != "" ]; then
...
fi
Is there a more efficient way of doing this?

You can use wild-card expansion *.
str1="any string"
str2="any"
if [[ "$str1" == *"$str2"* ]]
then
echo "str2 found in str1"
fi
Note that * expansion will not work with single [ ].

str1="any string"
str2="any"
Old school (Bourne shell style):
case "$str1" in *$str2*)
echo found it
esac
New school (as speakr shows), however be warned that the string to the right will be viewed as a regular expression:
if [[ $str1 =~ $str2 ]] ; then
echo found it
fi
But this will work too, even if you're not exactly expecting it:
str2='.*[trs].*'
if [[ $str1 =~ $str2 ]] ; then
echo found it
fi
Using grep is slow, since it spawns a separate process.

You can use bash regexp matching without using grep:
if [[ $str1 =~ $str2 ]]; then
...
fi
Note that you don't need any surrounding slashes or quotes for the regexp pattern. If you want to use glob pattern matching just use == instead of =~ as operator.
Some examples can be found here.

if echo $str1 | grep -q $str2 #any command
then
.....
fi

sub string search bash scripting

When given a string I want to search for a substring which matches two characters (9&0. 0 should be the last character in that substring) and exactly two characters in between them
string="asd20 92x0x 72x0 YX92s0 0xx0 92x0x"
#I want to select substring YX92s0 from that above string
for var in $string
do
if [[ "$var" == *9**0 ]]; then
echo $var // Should print YX92s0 only
fi
done
Obviously this above command doesn't work.

You match each element against the pattern *9??0. There are several ways you can do this; here's one that uses the string to set the positional parameters in a subshell, then iterates over them in a for loop:
( set -- $string
for elt; do [[ $elt == *9??0 ]] && { echo "found"; exit; }; done )

string="asd20 92x0x 72x0 X92s0 0xx0"
if [[ $string =~ [[:space:]].?9.{2}0[[:space:]] ]]; then
echo "found"
fi
Or better, taking advantage of word spliting :
string="asd20 92x0x 72x0 X92s0 0xx0"
for s in $string; do
if [[ $s =~ (.*9.{2}0) ]]; then
echo "${BASH_REMATCH[1]} found"
fi
done
This is regex with bash.

KSH: search string for multiple substrings

I have a simple way to search for multiple substrings in a single string:
if [[ $string = *"string 1"* && $string = *"string 2"* && $string = *"string 3"* ]]
(here searching for string 1, string 2 and string 3 in string).
How can I simplify this, so that there is only one check?
I've tried:
if [[ $string = *"string 1"*"string 2"*"string 3"* ]]
and
if [[ $string = *"string 1*string 2*string 3"* ]]
Note: the three strings specified here will always be in this order, hence why I can simplify it.

In ksh93, you can use the & sub-pattern delimiter.
$ [[ abcdefg == #(*bcd*&*cde*&*efg*) ]]; echo $?
0
$ [[ abcdefg == #(*bcdz*&*cde*&*efg*) ]]; echo $?
1
Only ksh93 has this unfortunately. In mksh, zsh, and bash, with extended matching, the negation sub-pattern allows for this DeMorgan-like equivalence.
$ [[ abcdefg == !(!(*bcd*)|!(*cde*)|!(*efg*)) ]]; echo $?
0
$ [[ abcdefg == !(!(*bcdz*)|!(*cde*)|!(*efg*)) ]]; echo $?
1
To test for just one pattern, see this FAQ

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Linux input pattern matching [duplicate] - linux

String: name#gmail.com Checking for: # .com My code if [[ $word =~ "#" ]] then if [[ $word =~ ".com" || $word =~ ".ca" ]] My problem name#.com The above example gets passed, which is not what I want. How do I check for characters (1 or more) between "#" and ".com"?

why not go for other tools like perl: > echo "x#gmail.com" | perl -lne 'print $1 if(/#(.*?)\.com/)' gmail

Related

Bash extract substring from a generic string

How do I test if a variable is a string in bash?

How to check if a string is a substring of another?

sub string search bash scripting

KSH: search string for multiple substrings

Categories

Resources