XML Schema validation pattern shall not allow string - xsd

I want to allow the alphanumeric characters except for the world "AAAA"
I am using the below regex
To allow alphanumeric characters <xs:pattern value="[A-Za-z0-9]{2,4}"/>
Not to allow AAAA as <xs:pattern value="[^A]{4}"/>
But if I combine both it does not work.
Please help

It is not easy to match strings using a regex. The pattern [^A]{4} does not mean not 4 occurrences of A. It means 4 occurrences of 'not A'.
I think something like this should work:
[A-Za-z0-9]{2,3} |
[B-Za-z0-9][A-Za-z0-9]{3} |
[A-Za-z0-9][B-Za-z0-9][A-Za-z0-9]{2} |
[A-Za-z0-9]{2}[B-Za-z0-9][A-Za-z0-9] |
[A-Za-z0-9]{3}[B-Za-z0-9]
which means,
a 2-char or 3-char alphanumeric string or
a 4 char alphanumeric string with the 1st char not 'A' or
a 4 char alphanumeric string with the 2nd char not 'A' or
a 4 char alphanumeric string with the 3rd char not 'A' or
a 4 char alphanumeric string with the 4th char not 'A'
There might be an easier solution, but I cannot think of it.

Related

How to ignore a scape of a word character in a python string with two consecutive apostrophe

Escape Issue
Hello all,
I have the bellow python string and I am trying to basically to ignore the '' scape characters. My Goal is to have the two single apostrophe before and after the DD but the raw string is not working as expected as seen in the image. Is there a way I could make them appear as I am expecting ?
r"""hello my name is "John" I cannot remove ''DD'' the escape character in order to show
the two single quotes together"""
Thanks,
I tried using replacements functions and other methods that I researched and non worked as expected.
chars = r""""John" ''DD''"""
chars # '"John" \'\'DD\'\''
'"John" \'\'DD\'\''
Above you see escaped inner single quotes (apostrophes) because the output is presented as single-quoted string. In fact, there are no reverse solidi (backslashes) in the chars string:
len(chars) # 13
print( chars) # "John" ''DD''
"John" ''DD''
Another proof (chars string as a sequence characters):
import unicodedata
for jj, char in enumerate( chars):
print( f'{jj:3}', char, unicodedata.name(char,'???'))
0 " QUOTATION MARK
1 J LATIN CAPITAL LETTER J
2 o LATIN SMALL LETTER O
3 h LATIN SMALL LETTER H
4 n LATIN SMALL LETTER N
5 " QUOTATION MARK
6 SPACE
7 ' APOSTROPHE
8 ' APOSTROPHE
9 D LATIN CAPITAL LETTER D
10 D LATIN CAPITAL LETTER D
11 ' APOSTROPHE
12 ' APOSTROPHE
Read more about escaping in String and Bytes literals

Regex - Stop after finding the first pattern

For a string like this:
1. Jane, Doe2. Good, Jay3. Turn, Bob[key]
Either Jane, Doe needs to be extracted if no [key] is present then whatever is between 1. and 2.
(or)
Turn, Bob if [key] is present
Put another way:
If [key] is present, then the person before [key] needs to be extracted and the process stopped.
If [key] is not present, then pick up whoever is after 1.
I tried this but it pulls up both Jane, Doe and Turn, Bob
(\.([^\.])(.+)\[key\])|(1\.(.+)2\.)
How to stop after finding the first successful pattern, knowing that patterns are read left to right? [key] can be anyone - 1,2 or 3.
Thanks.
For these requirements, you may use this regex in Python with an alternation:
(?<=\d\.\s)[a-zA-Z, ]+(?=\[key])|(?<=1\.\s)(?!.*\[key])[a-zA-Z, ]+
RegEx Demo
RegEx Details:
(?<=\d\.\s): Positive lookbehind to assert that there is a digit followed by dot followed by a whitespace before the current position
[a-zA-Z, ]+: Match 1+ of letter, space or comma characters
(?=\[key]): Positive lookahead to assert that there is a text [key] after the current position
|: OR
(?<=1\.\s): Positive lookbehind to assert that there is a digit 1 followed by dot followed by a whitespace before the current position
(?!.*\[key]): Negative lookbehind to assert that there is no [key] text after the current position
[a-zA-Z, ]+: Match 1+ of letter, space or comma characters
Not sure why you put .+ into your regex but it's greedy and matches . Good, Jay3. Turn, Bob. so the left part of the alternation matches.
Suggest you remove the .+ on both sides of the alternation ( | ).

Looking for a Regex which can find all the number combinaitions without having 3 zero's in between and mixed with delimeters

I would like to find all the number combinaitions without having 3 zero's in between.
There might be some delimiters (max 2 characters) in between the numbers.
I'm using python and I would like to perform this search with the regex.
Accepted numbers
This is number 1234 which should be accepted.
12-45
1 2 0 0 3 4 5
not accepted numbers:
1
12
123
1000
1000-2000
30000-31000
21 000-32 000-50 000
21 00 03 00 00
The regex with which I could come up is:
([\s\-]{0,2}\d(?!000)){4,}
My regex can find all the accepted numbers but it doesn't filter out all the excepted numbers.
See the results in regex
Actually this regex is used in python to remove the matched numbers from the text:
See python code
p.s. Delimiters are not only space but should be at least \s and dash.
p.s.s. The numbers might be in the middle of the string. So I think I cannot use ^ and $ in my regex.
You could assert not 3 zeroes in a row while matching optional delimiters in between.
\b(?![\d\s-]*?0(?:[\s-]*0){2})\d(?:[\s-]*\d){3,}\b
Explanation
\b A word boundary
(?! Negative lookahead, assert what is at the right is not
[\d\s-]*? Match any of a digit, whitespace char or - as least as possible
0(?:[\s-]*0){2} - ) Match a zere followed by 2 times a zero with optional delimiters in between
\d Match a digit
(?:[\s-]*\d){3,} Repeat 3 or more times matching a digit with optional delimiters in between
\b A word boundary
Regex demo

Regex to match between 2 and 5 characters, one of which must be alphabetic

I'm not great with regex and the following has me stumped.
I need to find all the matches in a string that are between 2 and 5 characters [A-Z0-9] only, and must contain at least one alphabetic character [A-Z]
So
A1 - Match
AAA - Match
AAAAAA - No Match
A1234 - Match
123 - No Match
A123A - Match
A - No Match
1 - No Match
A1B2C3 - No Match
I have tried this:
([A-Z0-9]*[A-Z][A-Z0-9]*){2,5}
But it doesnt limit the total length of the match to between 2 and 5 characters
You can use
\b(?=\d*[A-Z])[A-Z\d]{2,5}\b
\b(?=[A-Z0-9]{2,5}\b)[A-Z0-9]*[A-Z][A-Z0-9]*\b
See the regex demo #1 and the regex demo #2. Details:
\b - word boundary
(?=\d*[A-Z]) - after zero or more digits, there must be an uppercase ASCII letter
(?=[A-Z0-9]{2,5}\b) - there must be 2 to 5 alnum chars up to the word boundary
[A-Z0-9]* - zero or more uppercase ASCII letters or digits
[A-Z] - an uppercase ASCII letter
[A-Z\d]{2,5} - two to five uppercase ASCII letters or digits
\b - word boundary.
See the Python demo:
import re
text = "A1 AAA....A1234!!!!~A123A abc,AAAAAA,123,A,1,A1B2C3"
print(re.findall(r'\b(?=[A-Z0-9]{2,5}\b)[A-Z0-9]*[A-Z][A-Z0-9]*\b', text))
# => ['A1', 'AAA', 'A1234', 'A123A']
Try this one
^([A-Z][A-Z0-9]{1,4}|[A-Z0-9][A-Z][A-Z0-9]{,3}|[A-Z0-9]{1,2}[A-Z][A-Z0-9]{,3}|[A-Z0-9][A-Z][A-Z0-9])$

Lua string space, letters and digits

I want to check whether a string only contains letters, digits and space. I found this function to check that it's only letters and digits, but I'm not sure how to allow spaces too:
string:match( "%W" );
thanks.
The pattern %w matches an alphanumeric character. The uppercase version %W matches its complement, i.e, a non-alphanumeric character.
To get the complement of alphanumeric and whitespace, try the pattern [^%w%s].
str:match("[^%w%s]")
returns a non-nil value when a string is NOT only letters, digits, and spaces.

Resources