Regular expression pattern to validate name field - excel

I have the following requirements for validating an input field:
It should only contain alphabetical characters and spaces.
It cannot contain spaces at the beginning or end of the string.
It cannot contain any other special character.
For example, the expression should accept the following string "my name is wish".
The regular expression which I'm using is:
RegExp.Pattern = "^[\a-zA-Z]*[\s]*[\a-zA-Z]*[\s]*[\a-zA-Z]*$"
When I enter a name as "abc abc abc6" it accepts it as valid. It should give an error since a number is entered.

Try this pattern
^[a-zA-Z]+(?:\s+[a-zA-Z]+)*$
Explanation of pattern:
^ Start of string
[a-zA-Z] Any character in the class a to z or A to Z
+ One or more repititions
(?: ) Match expresion but don't capture
\s+ Whitespace, One or more repititions
* Zero or more repititions
$ End of string

Related

Regex: Match between delimiters (a letter and a special character) in a string to form new sub-strings

I was working on a certain problem where I have form new sub-strings from a main string.
For e.g.
in_string=ste5ts01,s02,s03
The expected output strings are ste5ts01, ste5ts02, ste5ts03
There could be comma(,) or forward-slash (/) as the separator and in this case the delimiters are the letter s and ,
The pattern I have created so far:
pattern = r"([^\s,/]+)(?<num>\d+)([,/])(?<num>\d+)(?:\2(?<num>\d+))*(?!\S)"
The issue is, I am not able to figure out how to give the letter 's' as one of the delimiters.
Any help will be much appreciated!
You might use an approach using the PyPi regex module and named capture groups which are available in the captures:
=(?<prefix>s\w+)(?<num>s\d+)(?:,(?<num>s\d+))+
Explanation
= Match literally
(?<prefix>s\w+) Match s and 1+ word chars in group prefix
(?<num>s\d+) Capture group num match s and 1+ digits
(?:,(?<num>s\d+))+ Repeat 1+ times matching , and capture s followed by 1+ digits in group num
Example
import regex as re
pattern = r"=(?<prefix>s\w+)(?<num>s\d+)(?:,(?<num>s\d+))+"
s="in_string=ste5ts01,s02,s03"
matches = re.finditer(pattern, s)
for _, m in enumerate(matches, start=1):
print(','.join([m.group("prefix") + c for c in m.captures("num")]))
Output
ste5ts01,ste5ts02,ste5ts03

How to match a wildcard for strings?

Please suggest a wildcard for below Firstjson list
Firstjson = { p10_7_8 , p10_7_2 , p10_7_3 p10_7_4}
I have tried p10.7.* wildcard for below Secondjson list, it worked. But when I tried p10_7_* for above Firstjson list it did not work
Secondjson = { p10.7.8 , p10.7.2 , p10.7.3 , p10.7.4 }
You are attempting to use wildcard syntax, but Groovy expects regular expression syntax for its pattern matching.
What went wrong with your attempt:
Attempt #1: p10.7.*
A regular expression of . matches any single character and .* matches 0 or more characters. This means:
p10{exactly one character of any kind here}7{zero or more characters of any
kind here}
You didn't realize it, but the . character in your first attempt was acting like a single-character wildcard too. This might match with p10x7abcdefg for example. It also does match p10.7.8 though. But be careful, it also matches p10.78, because the .* expression at the end of your pattern will happily match any sequence of characters, thus any and all characters following p10.7 are accepted.
Attempt #2: p10_7_*
_ matches only a literal underscore. But _* means to match zero or more underscores. It does not mean to match characters of any kind. So p10_7_* matches things like p10_7_______. Literally:
p10_7{zero or more underscores here}
What you can do instead:
You probably want a regular expression like p10_7_\d+
This will match things like p10_7_3 or p10_7_422. It works by matching the literal text p10_7_ followed by one or more digits where a digit is 0 through 9. \d matches any digit, and + means to match one or more of the preceding thing. Literally:
p10_7_{one or more digits here}

Replace characters other than A-Za-z0-9 and decimal values with space using regex

I want to keep alphanumeric characters and also the decimal numbers present in my text string and replace all other characters with space.
For alphanumeric characters, I can use
def clean_up(text):
return re.sub(r"[^A-Za-z0-9]", " ", text)
But this will replace all . whether they are between two digits or a fullstop or at random locations. I just want to keep the . if they come between two digits.
I thought of [^((A-Za-z0-9)|(\d\.\d))], but it doesn't seem to work.
You can match and capture the patterns you need to keep and just match any char otherwise. Then, using the lambda expression as the replacement argument, you can either replace with the captured substring or a space.
The patterns are:
[+-]?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)? - matches any number
[^\W_] - matches any alphanumeric, Unicode included
. - matches any char (with re.S or re.DOTALL).
The solution looks like
pattern = re.compile(r'([+-]?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?|[^\W_])|.', re.DOTALL)
def clean_up(text):
return pattern.sub(lambda x: x.group(1) or " ", text)
See the online demo:
import re
pattern = re.compile(r'([+-]?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?|[^\W_])|.', re.DOTALL)
def clean_up(text):
return pattern.sub(lambda x: x.group(1) or " ", text)
print( clean_up("+1.2E02 ANT01-TEXT_HERE!") )
Output: +1.2E02 ANT01 TEXT HERE
[^A-Za-z0-9](?!\d)
You can use Negated Character Class with Negative lookahead.
[...] is a character set, which match the literal character inside, so [^((A-Za-z0-9)|(\d\.\d))] means not to match [A-Za-z0-9] and the literal (, |, . and ).
You may try [^a-zA-Z0-9.]|(?<!\d)\.|\.(?!\d):
[^a-zA-Z0-9.] match all non-alphanumeric characters except .
(?<!\d)\. match . after non-digit
\.(?!\d) match . before non-digit
Test run:
print(re.sub(r"[^a-zA-Z0-9.]|(?<!\d)\.|\.(?!\d)", " ", ".aBc.1e2#3.4F5$6.gHi."))
Output:
aBc 1e2 3.4F5 6 gHi

Regular expression to extract text between multiple hyphens

I have a string in the following format
----------some text-------------
How do I extract "some text" without the hyphens?
I have tried with this but it matches the whole string
(?<=-).*(?=-)
The pattern matches the whole line except the first and last hyphen due to the assertions on the left and right and the . also matches a hyphen.
You can keep using the assertions, and match any char except a hyphen using [^-]+ which is a negated character class.
(?<=-)[^-]+(?=-)
See a regex demo.
Note: if you also want to prevent matching a newline you can use (?<=-)[^-\r\n]+(?=-)
With your shown samples please try following regex.
^-+([^-]*)-+$
Online regex demo
Explanation: Matching all dashes from starting 1 or more occurrences then creating one and only capturing group which has everything matched till next - comes, then matching all dashes till last of value.
You are using Python, right? Then you do not need regex.
Use
s = "----------some text-------------"
print(s.strip("-"))
Results: some text.
See Python proof.
With regex, the same can be achieved using
re.sub(r'^-+|-+$', '', s)
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
-+ '-' (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
-+ '-' (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

What is the difference between string "match" and string "equal" in TCL

In TCL what is the difference between string "match" and string "equal".
They are almost same so I am not able to detect the difference between them.
string equal compares two strings character by character and returns 1 if they both contain the same characters (case sensitive: can be overridden).
string match compares a string against a glob-style pattern and returns 1 if the string matches the pattern.
In a degenerate case, a string match with only non-special characters in the pattern is equivalent to a string equal.
Documentation:
string
Syntax of Tcl string matching:
* matches a sequence of zero or more characters
? matches a single character
[chars] matches a single character in the set given by chars (^ does not negate; a range can be given as a-z)
\x matches the character x, even if that character is special (one of *?[]\)
already answered in
TCL string match vs regexps
Regexp are slower than base function. So you should avoid regex for equal check

Resources