XSD restriction for string value - xsd

I need to write a restriction for a string column, so that it will include 20 letters and 3 of the following : (. , -).
How can this be done?
Thank You.

Please try this:
<xs:element name="someString">
<xs:simpleType>
<xs:restriction base="xsd:string">
<xs:pattern value="([\w]*[\.\,\-]?[\w]*[\.\,\-]?[\w]*[\.\,\-]?[\w]*)"/>
<xs:length value="23"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
Explanation:
[\w\.\,\-]{0,20} match a single character present in the list below
Quantifier:
Between 0 and 20 times, as many times as possible, giving back as needed [greedy]
\w match any word character [a-zA-Z0-9_]
\. matches the character .literally
\, matches the character , literally
\- matches the character - literally
So basically this will match all these
aabbbcaabbbcaabbbcac,,-
,,-aabbbcaabbbcaabbbcac
dd,aabbbc-aabbbc,aabbbc
The length value="23" forces that the total string be 23 characters and the regex forces that three characters of the total length be between comma, hyphen and period.

Related

Regex - Stop after finding the first pattern

For a string like this:
1. Jane, Doe2. Good, Jay3. Turn, Bob[key]
Either Jane, Doe needs to be extracted if no [key] is present then whatever is between 1. and 2.
(or)
Turn, Bob if [key] is present
Put another way:
If [key] is present, then the person before [key] needs to be extracted and the process stopped.
If [key] is not present, then pick up whoever is after 1.
I tried this but it pulls up both Jane, Doe and Turn, Bob
(\.([^\.])(.+)\[key\])|(1\.(.+)2\.)
How to stop after finding the first successful pattern, knowing that patterns are read left to right? [key] can be anyone - 1,2 or 3.
Thanks.
For these requirements, you may use this regex in Python with an alternation:
(?<=\d\.\s)[a-zA-Z, ]+(?=\[key])|(?<=1\.\s)(?!.*\[key])[a-zA-Z, ]+
RegEx Demo
RegEx Details:
(?<=\d\.\s): Positive lookbehind to assert that there is a digit followed by dot followed by a whitespace before the current position
[a-zA-Z, ]+: Match 1+ of letter, space or comma characters
(?=\[key]): Positive lookahead to assert that there is a text [key] after the current position
|: OR
(?<=1\.\s): Positive lookbehind to assert that there is a digit 1 followed by dot followed by a whitespace before the current position
(?!.*\[key]): Negative lookbehind to assert that there is no [key] text after the current position
[a-zA-Z, ]+: Match 1+ of letter, space or comma characters
Not sure why you put .+ into your regex but it's greedy and matches . Good, Jay3. Turn, Bob. so the left part of the alternation matches.
Suggest you remove the .+ on both sides of the alternation ( | ).

Regex to match between 2 and 5 characters, one of which must be alphabetic

I'm not great with regex and the following has me stumped.
I need to find all the matches in a string that are between 2 and 5 characters [A-Z0-9] only, and must contain at least one alphabetic character [A-Z]
So
A1 - Match
AAA - Match
AAAAAA - No Match
A1234 - Match
123 - No Match
A123A - Match
A - No Match
1 - No Match
A1B2C3 - No Match
I have tried this:
([A-Z0-9]*[A-Z][A-Z0-9]*){2,5}
But it doesnt limit the total length of the match to between 2 and 5 characters
You can use
\b(?=\d*[A-Z])[A-Z\d]{2,5}\b
\b(?=[A-Z0-9]{2,5}\b)[A-Z0-9]*[A-Z][A-Z0-9]*\b
See the regex demo #1 and the regex demo #2. Details:
\b - word boundary
(?=\d*[A-Z]) - after zero or more digits, there must be an uppercase ASCII letter
(?=[A-Z0-9]{2,5}\b) - there must be 2 to 5 alnum chars up to the word boundary
[A-Z0-9]* - zero or more uppercase ASCII letters or digits
[A-Z] - an uppercase ASCII letter
[A-Z\d]{2,5} - two to five uppercase ASCII letters or digits
\b - word boundary.
See the Python demo:
import re
text = "A1 AAA....A1234!!!!~A123A abc,AAAAAA,123,A,1,A1B2C3"
print(re.findall(r'\b(?=[A-Z0-9]{2,5}\b)[A-Z0-9]*[A-Z][A-Z0-9]*\b', text))
# => ['A1', 'AAA', 'A1234', 'A123A']
Try this one
^([A-Z][A-Z0-9]{1,4}|[A-Z0-9][A-Z][A-Z0-9]{,3}|[A-Z0-9]{1,2}[A-Z][A-Z0-9]{,3}|[A-Z0-9][A-Z][A-Z0-9])$

How do I remove text using sed?

For instance let say I have a text file:
worker1, 0001, company1
worker2, 0002, company2
worker3, 0003, company3
How would I use sed to take the first 2 characters of the first column so "wo" and remove the rest of the text and attach it to the second column so the output would look like this:
wo0001,company1
wo0002,company2
wo0003,company3
$ sed -E 's/^(..)[^,]*, ([^,]*,) /\1\2/' file
wo0001,company1
wo0002,company2
wo0003,company3
s/ begin substitution
^(..) match the first two characters at the beginning of the line, captured in a group
[^,]* match any amount of non-comma characters of the first column
, match a comma and a space character
([^,]*,) match the second field and comma captured in a group (any amount of non-comma characters followed by a comma)
match the next space character
/\1\2/ replace with the first and second capturing group

XML Schema validation pattern shall not allow string

I want to allow the alphanumeric characters except for the world "AAAA"
I am using the below regex
To allow alphanumeric characters <xs:pattern value="[A-Za-z0-9]{2,4}"/>
Not to allow AAAA as <xs:pattern value="[^A]{4}"/>
But if I combine both it does not work.
Please help
It is not easy to match strings using a regex. The pattern [^A]{4} does not mean not 4 occurrences of A. It means 4 occurrences of 'not A'.
I think something like this should work:
[A-Za-z0-9]{2,3} |
[B-Za-z0-9][A-Za-z0-9]{3} |
[A-Za-z0-9][B-Za-z0-9][A-Za-z0-9]{2} |
[A-Za-z0-9]{2}[B-Za-z0-9][A-Za-z0-9] |
[A-Za-z0-9]{3}[B-Za-z0-9]
which means,
a 2-char or 3-char alphanumeric string or
a 4 char alphanumeric string with the 1st char not 'A' or
a 4 char alphanumeric string with the 2nd char not 'A' or
a 4 char alphanumeric string with the 3rd char not 'A' or
a 4 char alphanumeric string with the 4th char not 'A'
There might be an easier solution, but I cannot think of it.

Lua string space, letters and digits

I want to check whether a string only contains letters, digits and space. I found this function to check that it's only letters and digits, but I'm not sure how to allow spaces too:
string:match( "%W" );
thanks.
The pattern %w matches an alphanumeric character. The uppercase version %W matches its complement, i.e, a non-alphanumeric character.
To get the complement of alphanumeric and whitespace, try the pattern [^%w%s].
str:match("[^%w%s]")
returns a non-nil value when a string is NOT only letters, digits, and spaces.

Resources