XSD Validation Pattern to Enforce LastName/FirstName - xsd

I need to enforce the pattern LASTNAME/FIRSTNAME Something like Smith/John.
The characters can be Alphanumeric (lowercase/uppercase) also includes special characters like ë etc.
Pattern:
<xsd:pattern value="[a-zA-Z0-9]/[a-zA-Z0-9]"/>
Basically the rules will be
- Anything before the slash
- Anything after the slash
- Patterns like "/John", "John/" should not be allowed
Thanks in advance.

ASCII
Assuming that you don't want numbers in the names:
<xs:pattern value="[a-zA-Z]+/[a-zA-Z]+"/>
If you really want to accept numbers in the names:
<xs:pattern value="[a-zA-Z0-9]+/[a-zA-Z0-9]+"/>
Be aware that 0/0, for example, would be valid in this case, though.
Unicode
<xs:pattern value="\p{L}+/\p{L}+"/>
Explanation: \p{L} matches a Unicode code point in the Letter category.

Your restriction should be this..
<xs:pattern value="(([a-zA-Z0-9])*)([/])(([a-zA-Z0-9])*)"/>
I validated this pattern by XMLSpear

Related

Fullname multilingual Regexp

Currently the validation of fullname looks like:
/^[a-zA-Z ]{2,30}$/
But that regexp validates only latin alphabet names. This should be changed in order to handle multilingual characters also. I have tried:
/^(\p{L}\p{M}*){2,30}$/u
But it validates numbers within names also, which is not correct.
As in the first case, use a character class with Unicode as well:
/^[\p{L}\p{M}\p{Zs}]{2,30}$/u
The \p{Zs} denotes a space char, such as regular space and Japanese space char  .
In case you want to prevent space at the start and end, use these negative lookaheads:
/^(?!\p{Zs})(?!.*\p{Zs}$)[\p{L}\p{M}\p{Zs}]{2,30}$/u
See a demo on regex101.com.

Is technically valid a positiveInteger restriction with maxInclusive 9999999999?

I'm working with a web service from an external company, which has defined the following restriction to an element in their wsdl:
<xs:simpleType>
<xs:restriction base="xs:positiveInteger">
<xs:minInclusive value="1"/>
<xs:maxInclusive value="9999999999"/>
</xs:restriction>
</xs:simpleType>
Doing the conversion of this restriction in a class, I created a property with the tipe UInt32, but this data type only allows numbers up to 4294967295, very lower than the maxInclusive defined in the restriction.
This kind of restriction is technically and logicaly valid for a schema? or is wrong and the external company should change the base type to a bigger one?
Thanks in advance.
The restriction is fine. Have a look at the W3C standard.
[Definition:] positiveInteger is ·derived· from nonNegativeInteger by setting the value of ·minInclusive· to be 1. This results in the standard mathematical concept of the positive integer numbers. The ·value space· of positiveInteger is the infinite set {1,2,...}. The ·base type· of positiveInteger is nonNegativeInteger.
What they probably mean this value to be is an xs:unsignedInt or xs:unsignedLong, but technically its correct.

XML schema restriction pattern for not allowing specific string

I need to write an XSD schema with a restriction on a field, to ensure that
the value of the field does not contain the substring FILENAME at any location.
For example, all of the following must be invalid:
FILENAME
ORIGINFILENAME
FILENAMETEST
123FILENAME456
None of these values should be valid.
In a regular expression language that supports negative lookahead, I could do this by writing /^((?!FILENAME).)*$ but the XSD pattern language does not support negative lookahead.
How can I implement an XSD pattern restriction with the same effect as /^((?!FILENAME).)*$ ?
I need to use pattern, because I don't have access to XSD 1.1 assertions, which are the other obvious possibility.
The question XSD restriction that negates a matching string covers a similar case, but in that case the forbidden string is forbidden only as a prefix, which makes checking the constraint easier. How can the solution there be extended to cover the case where we have to check all locations within the input string, and not just the beginning?
OK, the OP has persuaded me that while the other question mentioned has an overlapping topic, the fact that the forbidden string is forbidden at all locations, not just as a prefix, complicates things enough to require a separate answer, at least for the XSD 1.0 case. (I started to add this answer as an addendum to my answer to the other question, and it grew too large.)
There are two approaches one can use here.
First, in XSD 1.1, a simple assertion of the form
not(matches($v, 'FILENAME'))
ought to do the job.
Second, if one is forced to work with an XSD 1.0 processor, one needs a pattern that will match all and only strings that don't contain the forbidden substring (here 'FILENAME').
One way to do this is to ensure that the character 'F' never occurs in the input. That's too drastic, but it does do the job: strings not containing the first character of the forbidden string do not contain the forbidden string.
But what of strings that do contain an occurrence of 'F'? They are fine, as long as no 'F' is followed by the string 'ILENAME'.
Putting that last point more abstractly, we can say that any acceptable string (any string that doesn't contain the string 'FILENAME') can be divided into two parts:
a prefix which contains no occurrences of the character 'F'
zero or more occurrences of 'F' followed by a string that doesn't match 'ILENAME' and doesn't contain any 'F'.
The prefix is easy to match: [^F]*.
The strings that start with F but don't match 'FILENAME' are a bit more complicated; just as we don't want to outlaw all occurrences of 'F', we also don't want to outlaw 'FI', 'FIL', etc. -- but each occurrence of such a dangerous string must be followed either by the end of the string, or by a letter that doesn't match the next letter of the forbidden string, or by another 'F' which begins another region we need to test. So for each proper prefix of the forbidden string, we create a regular expression of the form
$prefix || '([^F' || next-character-in-forbidden-string || ']'
|| '[^F]*'
Then we join all of those regular expressions with or-bars.
The end result in this case is something like the following (I have inserted newlines here and there, to make it easier to read; before use, they will need to be taken back out):
[^F]*
((F([^FI][^F]*)?)
|(FI([^FL][^F]*)?)
|(FIL([^FE][^F]*)?)
|(FILE([^FN][^F]*)?)
|(FILEN([^FA][^F]*)?)
|(FILENA([^FM][^F]*)?)
|(FILENAM([^FE][^F]*)?))*
Two points to bear in mind:
XSD regular expressions are implicitly anchored; testing this with a non-anchored regular expression evaluator will not produce the correct results.
It may not be obvious at first why the alternatives in the choice all end with [^F]* instead of .*. Thinking about the string 'FEEFIFILENAME' may help. We have to check every occurrence of 'F' to make sure it's not followed by 'ILENAME'.

Prolog : Remove extra spaces in a stream of characters

Total newb to Prolog. This one is frustrating me a bit. My 'solution' below is me trying to make Prolog procedural...
This will remove spaces or insert a space after a comma if needed, that is, until a period is encountered:
squish:-get0(C),put(C),rest(C).
rest(46):-!.
rest(32):-get(C),put(C),rest(C).
rest(44):-put(32), get(C), put(C), rest(C).
rest(Letter):-squish.
GOAL: I'm wondering how to remove any whitespace BEFORE the comma as well.
The following works, but it is so wrong on so many levels, especially the 'exit'!
squish:-
get0(C),
get0(D),
iteratesquish(C,D).
iteratesquish(C,D):-
squishing(C,D),
get0(E),
iteratesquish(D,E).
squishing(46,X):-put(46),write('end.'),!,exit.
squishing(32,32):-!.
squishing(32,44):-!.
squishing(32,X):-put(32),!.
squishing(44,32):-put(44),!.
squishing(44,44):-put(44), put(32),!.
squishing(44,46):-put(44), put(32),!.
squishing(44,X):-put(44), put(32),!.
squishing(X,32):-put(X),!.
squishing(X,44):-put(X),!.
squishing(X,46):-put(X),!.
squishing(X,Y):-put(X),!.
Since you are describing lists (in this case: of character codes), consider using DCG notation. For example, to let any comma be followed by a single whitespace, consider using code similar to:
squish([]) --> [].
squish([(0',),(0' )|Rest]) --> [0',], spaces, !, squish(Rest).
squish([L|Ls]) --> [L], squish(Ls).
spaces --> [0' ], spaces.
spaces --> [].
Example query:
?- phrase(squish(Ls), "a, b,c"), format("~s", [Ls]).
a, b, c
So, first focus on a clear declarative description of the relation between character sequences and the desired "clean" string. You can then use SWI-Prolog's library(pio) to read from files via these grammar rules. To remove all spaces preceding commas, you only have to add a single rule to the DCG above (to squish//1), which I leave as exercise to you. A corner case of course is if a comma is followed by another comma, in which case the requirements are contradictory :-)

XSD pattern to match a priority order of comma separated words

I have a tag like this
<order>foo,bar,goo,doo,woo</order>
that I need to validate with an xsd.
How do I write a regexp pattern that matches the string that contains:
List item any of {foo,bar,goo,doo,woo} maximum once
or is empty.
Valid examples:
<order>foo,bar,goo,doo,woo</order>
<order>foo,bar,goo</order>
<order>foo,doo,goo,woo</order>
<order>woo,foo,goo,doo,bar</order>
<order></order>
Invalid:
<order>foo,foo</order>
<order>,</order>
<order>fo</order>
<order>foobar</order>
This have to work in different XML/XSD parsers.
I don't think you can express all the rules in a regular expression. Especially, it will be tough to enforce "maximum once". This is the closest I come up with,
<xs:simpleType name="order">
<xs:annotation>
<xs:documentation>
Comma-separated list of anything
</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:pattern value="[^,]+(,\s*[^,]+)*"/>
</xs:restriction>
</xs:simpleType>
You might want try to use space as separator. That's more common in XML files. XML Schema has a builtin type "list" defined for space-separated list.
I can write a schema file if they follow some sequence .. like {foo,bar,goo,doo,woo}
but in your case, you say that, they can appear in ANY SEQUENCE.. so (5P5+5P4+5P3+5P2+5P1+1) = 326 patterns .. !!!
If there was some sequence as I mentioned .. then the number of patterns would have been .. 32 .. bearable..

Resources