Validate string for file characters including space - linux

I have the following code:
if ! [[ $1 =~ ^[0-9a-zA-Z._-]+$ ]]; then
echo "argument contains characters not valid for name file"
fi
All I want is to validate if the string has valid characters for valid file name (I know I should add test for beginning of file, and length afterwards).
PROBLEM:
As such, it does not validate strings with spaces in it.
So I need to include space in the regex, but nothing of the following works:
[[ $1 =~ ^[0-9a-zA-Z ._-]+$ ]] >> syntax error
[[ $1 =~ ^[0-9a-zA-Z\t._-]+$ ]] >> still do not pass spaces
[[ $1 =~ ^[0-9a-zA-Z\s._-]+$ ]] >> still do not pass spaces
[[ $1 =~ "^[0-9a-zA-Z ._-]+$" ]] >> syntax error
I'm not sure what more to try.
So far, I come up with a quick and dirty thing:
myNewVar="${1// /}"
and do the tests with that, but that's far from elegant ...

You could use the [:blank:] character class:
re='^[[:alnum:][:blank:]._-]+$'
if ! [[ $1 =~ $re ]]; then
Notice that I've move the regex into a separate variable1, and also introduced the [:alnum:] character class.
Instead of regular expressions, you could use parameter expansion to remove allowed characters and see if anything is left:
if [[ -n ${1//[[:alnum:][:blank:]._-]} ]]; then
echo "illegal character found"
fi
1Mostly for portability reasons, but also to avoid quoting surprises (like the unquoted blank in your last example), see the BashGuide (section "Regular Expressions").

Related

How to extract the text after a hyphen in bash

I have a string: dev/2.0 or dev/2.0-tymlez. How can I extract the string after the last - hyphen in bash? If there is no -, then the variable should be empty else tymlez and I want to store the result in $STRING. After that I would like to check the variable with:
if [ -z "$STRING" ]
then
echo "\$STRING is empty"
else
echo "\$STRING is NOT empty"
fi
Is that possible?
I recommend against calling your variable STRING. All-uppercase variables are used by the system (e.g. HOME) or the shell itself (e.g. PWD, RANDOM).
That said, you could do something like
string='dev/2.0-tymlez'
case "$string" in
*-*) string="${string##*-}";;
*) string='';;
esac
It's a bit clunky: It first checks whether there are any - at all, and if so, it removes the longest prefix matching *-; otherwise it just sets string to empty (because *- wouldn't have matched anything then).
You could use the =~ operator:
string="dev/2.0-tymlez"
[[ $string =~ -([^-]+)$ ]]; string=${BASH_REMATCH[1]}
BASH_REMATCH is a special array where the matches from [[ ... =~ ... ]] are assigned to.
You can use sed:
for string in "dev/2.0" "dev/2.0-1-2-3" "dev/2.0-tymlez"; do
string=$(sed 's/[^-]*[-]*//' <<< "${string}")
echo "string=[${string}]"
done
Result
string=[]
string=[1-2-3]
string=[tymlez]

How do I see if a parameter stars with an uppercase letter in Bash?

I need to make a script that iterates through a list of parameters and checks/counts if the parameter starts with an uppercase letter. I have some starter code but I am stuck and would appreciate any help!
Several notes:
You're missing the =~ operator for a regular expression
Your if is not ended by a fi.
Using [A-Z] doesn't work in all locales, and is needlessly fragile. Some collation orders are of the form AaBbCcDd, and thus A-Z contains a, b, etc; [[:upper:]] is guaranteed to do the right thing everywhere.
Unquoted $# behaves exactly the same as unquoted $*. If you want to correctly honor the quoting and escaping used when your function was first called, use "$#", quoted.
Consider instead:
#!/bin/bash
(( "$#" )) || { echo "Error: No arguments given" >&2; exit 1; }
re='^[[:upper:]]' # store regex in a variable for compatibility with old bash releases
for word in "$#"; do
[[ $word =~ $re ]] && ((++count))
done
echo "$count arguments started with upper-case characters"
Alternately, by using a case statement you can avoid requiring bash, and also check for other types:
for word in "$#"; do
case $word in
[[:upper:]]*) (( ++upper_count )) ;;
[[:lower:]]*) (( ++lower_count )) ;;
[[:digit:]]*) (( ++digit_count )) ;;
esac
done
echo "Found $upper_count arguments starting with upper-case letters"
echo "Found $lower_count arguments starting with lower-case letters"
echo "Found $digit_count arguments starting with digits"
#! /bin/bash
if [ $# -eq 0 ]; then
echo Error
exit 1
fi
COUNT=`echo "$#" | tr ' ' '\n' | grep "^[A-Z]" | wc -l`
echo $COUNT

Linux input pattern matching [duplicate]

String:
name#gmail.com
Checking for:
#
.com
My code
if [[ $word =~ "#" ]]
then
if [[ $word =~ ".com" || $word =~ ".ca" ]]
My problem
name#.com
The above example gets passed, which is not what I want. How do I check for characters (1 or more) between "#" and ".com"?
You can use a very very basic regex:
[[ $var =~ ^[a-z]+#[a-z]+\.[a-z]+$ ]]
It looks for a string being exactly like this:
at least one a-z char
#
at least one a-z char
.
at least one a-z char
It can get as complicated as you want, see for example Email check regular expression with bash script.
See in action
$ var="a#b.com"
$ [[ $var =~ ^[a-z]+#[a-z]+\.[a-z]+$ ]] && echo "kind of valid email"
kind of valid email
$ var="a#.com"
$ [[ $var =~ ^[a-z]+#[a-z]+\.[a-z]+$ ]] && echo "kind of valid email"
$
why not go for other tools like perl:
> echo "x#gmail.com" | perl -lne 'print $1 if(/#(.*?)\.com/)'
gmail
The glob pattern would be: [[ $word == ?*#?*.#(com|ca) ]]
? matches any single character and * matches zero or more characters
#(p1|p2|p3|...) is an extended globbing pattern that matches one of the given patterns. This requires:
shopt -s extglob
testing:
$ for word in #.com #a.ca a#.com a#b.ca a#b.org; do
echo -ne "$word\t"
[[ $word == ?*#?*.#(com|ca) ]] && echo matches || echo does not match
done
#.com does not match
#a.ca does not match
a#.com does not match
a#b.ca matches
a#b.org does not match

How to check if a string is a substring of another?

I have the following strings in bash
str1="any string"
str2="any"
I want to check if str2 is a substring of str1
I can do it in this way:
c=`echo $str1 | grep $str2`
if [ $c != "" ]; then
...
fi
Is there a more efficient way of doing this?
You can use wild-card expansion *.
str1="any string"
str2="any"
if [[ "$str1" == *"$str2"* ]]
then
echo "str2 found in str1"
fi
Note that * expansion will not work with single [ ].
str1="any string"
str2="any"
Old school (Bourne shell style):
case "$str1" in *$str2*)
echo found it
esac
New school (as speakr shows), however be warned that the string to the right will be viewed as a regular expression:
if [[ $str1 =~ $str2 ]] ; then
echo found it
fi
But this will work too, even if you're not exactly expecting it:
str2='.*[trs].*'
if [[ $str1 =~ $str2 ]] ; then
echo found it
fi
Using grep is slow, since it spawns a separate process.
You can use bash regexp matching without using grep:
if [[ $str1 =~ $str2 ]]; then
...
fi
Note that you don't need any surrounding slashes or quotes for the regexp pattern. If you want to use glob pattern matching just use == instead of =~ as operator.
Some examples can be found here.
if echo $str1 | grep -q $str2 #any command
then
.....
fi

Linux command to do wild card matching

Is there any bash command to do something similar to:
if [[ $string =~ $pattern ]]
but that it works with simple wild cards (?,*) and not complex regular expressions ??
More info:
I have a config file (a sort of .ini-like file) where each line is composed of a wild card pattern and some other data.
For any given input string that my script receives, I have to find the first line in the config file where the wild card pattern matches the input string and then return the rest of the data in that line.
It's simple. I just need a way to match a string against wild card patterns and not RegExps since the patterns may contain dots, brackets, dashes, etc. and I don't want those to be interpreted as special characters.
The [ -z ${string/$pattern} ] trick has some pretty serious problems: if string is blank, it'll match all possible patterns; if it contains spaces, the test command will parse it as part of an expression (try string="x -o 1 -eq 1" for amusement). bash's [[ expressions do glob-style wildcard matching natively with the == operator, so there's no need for all these elaborate (and trouble-prone) tricks. Just use:
if [[ $string == $pattern ]]
There's several ways of doing this.
In bash >= 3, you have regex matching like you describe, e.g.
$ foo=foobar
$ if [[ $foo =~ f.ob.r ]]; then echo "ok"; fi
ok
Note that this syntax uses regex patterns, so it uses . instead of ? to match a single character.
If what you want to do is just test that the string contains a substring, there's more classic ways of doing that, e.g.
# ${foo/b?r/} replaces "b?r" with the empty string in $foo
# So we're testing if $foo does not contain "b?r" one time
$ if [[ ${foo/b?r/} = $foo ]]; then echo "ok"; fi
You can also test if a string begins or ends with an expression this way:
# ${foo%b?r} removes "bar" in the end of $foo
# So we're testing if $foo does not end with "b?r"
$ if [[ ${foo%b?r} = $foo ]]; then echo "ok"; fi
# ${foo#b?r} removes "b?r" in the beginning of $foo
# So we're testing if $foo does not begin with "b?r"
$ if [[ ${foo#b?r} = $foo ]]; then echo "ok"; fi
ok
See the Parameter Expansion paragraph of man bash for more info on these syntaxes. Using ## or %% instead of # and % respectively will achieve a longest matching instead of a simple matching.
Another very classic way of dealing with wildcards is to use case:
case $foo in
*bar)
echo "Foo matches *bar"
;;
bar?)
echo "Foo matches bar?"
;;
*)
echo "Foo didn't match any known rule"
;;
esac
John T's answer was deleted, but I actually think he was on the right track. Here it is:
Another portable method which will work in most versions of bash is
to echo your string then pipe to grep. If no match is found, it will
evaluate to false as the result will be blank. If something is returned,
it will evaluate to true.
[john#awesome]$string="Hello World"
[john#awesome]$if [[ `echo $string | grep Hello` ]];then echo "match";fi
match
What John didn't consider is the wildcard requested by the answer. For that, use egrep, a.k.a. grep -E, and use the regex wildcard .*. Here, . is the wildcard, and * is a multiplier meaning "any number of these". So, John's example becomes:
$ string="Hello World"
$ if [[ `echo $string | egrep "Hel.*"` ]]; then echo "match"; fi
The . wildcard notation is fairly standard regex, so it should work with any command that speaks regex's.
It does get nasty if you need to escape the special characters, so this may be sub-optimal:
$ if [[ `echo $string | egrep "\.\-\$.*"` ]]; then echo "match"; fi

Resources