XSD Length Restriction Options - xsd

This is valid, but duplicates the constraints on the length using both the pattern and the maxLength to enforce it:
<xsd:simpleType name="MyType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="[0-9]{0,10}" />
<xsd:maxLength value="10" />
</xsd:restriction>
</xsd:simpleType>
The pattern alone would suffice:
<xsd:restriction base="xsd:string">
<xsd:pattern value="[0-9]{0,10}" />
</xsd:restriction>
Or the pattern could be simplified and we would rely on maxLength:
<xsd:restriction base="xsd:string">
<xsd:pattern value="[0-9]*" />
<xsd:maxLength value="10" />
</xsd:restriction>
Questions:
Are there known performance implications of choosing one over the other?
Will any given parser check the len first and short circuit the validation before compiling the pattern if both are provied?
Or will both be check in any case?
Does it vary from parser to parser?
I acknowledge that the performance difference here is probably minimal. I also expect that the regex engine may also be able to short circuit of the is a length constraint--but that's a level deeper than I probably care about.
Performance aside, I think i prefer having it all in the pattern, but that may relect my comfort level with regex rather than a typical best practice.
Thanks!

Is your code meant to be a number or numeric string? By that I mean are leading zeros allowed? If they aren't you could make your datatype even simpler by making it a restriction of xsd:integer with a maximum length or a maximum value such as either:
<xsd:restriction base="xsd:integer">
<xsd:maxLength value="10" />
</xsd:restriction>
or
<xsd:restriction base="xsd:integer">
<xsd:maxExclusive value="10000000000"/>
</xsd:restriction>
That going to be the simplest way of describing it, as well as probably faster as you are now doing an integer check, instead of a regex check.

In a way, your question has a funny side too... Since even you did a mistake in translating the last pattern (+ means one ore more, you wanted * instead), it proves a point that some will say about regular expressions, an that is regex could prove tricky. Regex is a struggle for many, whether we like it or not.
I am a firm believer in "Make things as simple as possible, but not simpler".
If you can without regex, stay away from it (see above). Reference as much as you can on built in types, and the provided facets (I would think the only valid case for you is if you wanted to allow for leading zeroes, otherwise an unsignedint with constraining facets would do the same).
If you can't, but regex could do it, don't hesitate to use it.
Never duplicate your "requirements" - maintenance is the most important reason. Undeniably there's a chance for extra CPU cycles, but unless someone is overdoing it intentionally, as you said it, the overhead is most likely minimal.
I think that if you stick with these principles, your questions kind of go away...

Related

how to define any element in step size in XML schema

if i want to define any element in XML schema like min value is 0 and max value is 91800 in step 360 means possible combination are 0,360,720 and so on without using enumeration pattern
how i can define this?
I cannot think of any way to do it - you cannot do arithmetic in validation rules.
You're stuck with using enumeration (that in your case seems possible - it is 256 possible values if I am not mistaken).
Since a finite state automaton can recognize the set of numbers evenly divisible by 360, it's possible in principle to do this with a fiendishly complicated regular expression, but for the range you have in mind an enumeration would in fact be a lot easier to understand (and to write correctly).
So in XSD 1.0, it's not quite true that using an enumeration is the only way to define the type you want, but it is true that it's by far the simplest and best way.
In XSD 1.1, you can use an assertion expressed in XPath 2.0 to capture the arithmetic relation:
<xs:simpleType name="small-multiples-of-three-sixty">
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="91800"/>
<xs:assertion test="$value mod 360 eq 0"/>
</xs:restriction>
</xs:simpleType>

what is the equivalent of 'sequence' in xsd schema (but without ordering)

I have some types with a sequence inside, which does restrict the order of child elements. I want to remove these order restrictions. Which element should I choose, assuming that I can't (or don't want) change definition of child elements?
For example, if I'd change with <xs:choise maxOccurs="unbounded"> it won't be full equivalent of <xs:sequence>, because some child elements which initially supposed to appear only once, could appear now several times.
And vice-versa, I can't use xs:all, as now I'll have a restriction on maximum amount of elements (no more than 1).
So, is there simple and quick solution? (to make as few changes to schema as possible)
Short answer is you can't.
An option would be to define a type for each combination of node sequences which are possible and then enclose them in an but this would be faintly ridiculous.
You can wrap the <sequence> in a <choice>.
<choice>
<sequence>
<!-- list your choices here -->
</sequence>
</choice>

cobol to xml schema - WTX Tool

We are doing xml to copybook and vice versa conversions in a middleware system, using IBM Websphere transformation extender. From this link
Cobol to xsd mapping
, we realised that
PIC X(03), in copybook, has to be converted to the below xml schema
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxlength value="3"/>
<xsd:whiteSpace value="preserve"/>
</xsd:restriction>
</xsd:simpletype>
PIC 9(03), in copybook, has to be converted to the below xml schema
<xsd:simpleType>
<xsd:restriction base="xsd:unsignedInt">
<xsd:minInclusive value="0"/>
<xsd:maxInclusive value="999"/>
</xsd:restriction>
</xsd:simpletype>
However, not able to make out, directly, what xml schema to be used for the below copybook types. Could anyone please guide?
PIC S9(17) COMP-3
PIC S9(17)
PIC S9(03)
PIC S9(03) COMP-3
PIC +9(17)
PIC +9(03)
The best way to understand a PICTURE clause, is to consult your COBOL documentation. I'll follow with some "documentation".
One thing to note is that while I consider the XSD snippets below as correct, different tools may not match exactly mine; for sure, whatever you get from your tools, should not be more restrictive than mine.
PIC S9(17) COMP-3
PIC S9(17)
PIC +9(17)
Note: COMP-3 doesn't matter from an XSD perspective; it affects the encoding in the COBOL world.
<xsd:simpleType name="S9-17">
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="-99999999999999999"/>
<xsd:maxInclusive value="99999999999999999"/>
</xsd:restriction>
</xsd:simpleType>
PIC S9(03)
PIC S9(03) COMP-3
PIC +9(03)
<xsd:simpleType name="S9-3">
<xsd:restriction base="xsd:int">
<xsd:minInclusive value="-999"/>
<xsd:maxInclusive value="999"/>
</xsd:restriction>
</xsd:simpleType>
Your PIC 9(03) means the number as unsigned, with an implied positive value.
A preceding S for a numeric value, S9(17) means "signed", with up to 17 decimal digits; the value may be positive or negative. Depending on other clauses, the sign could be separate, leading or trailing.
Things get tricky when a COMPutational clause is present, in which case data is encoded using a "binary" format (half the size, four bits per digit, the sign using the high order, left most bit) - in the COBOL world, not XML. The COMP clause (sometimes referred to as "packed") doesn't change the semantics of the value, it just describes the encoding mechanism, with direct impact on the size (in bytes) required to represent that particular number. For example, a PIC 9(17) will require 17 bytes, a PIC 9(17) COMP-3 will require 9 bytes. Clauses without a COMP are represented in DISPLAY format (basically one byte per decimal digit, plus one for sign, where applicable).
A preceding + sign is much like S; indicates that a number is signed, a + will be used for positive, a - will be used for negative numbers.
Because of this, when representing data in XML, what gets preserved is the data, not it's representation. Consider PIC 9(03) and a value of 1.
A COBOL to XML transform may preserve 001, or not (i.e. get 1).
An XML to COBOL transform must be able to take 001 or 1 and correctly convert it to 001.

How to use the xml schema group element

I am trying to design an XML structure to capture the output from a spreadsheet which contains a Customer Name and many different amount columns. And there is a total row as well.
I have about 4 amounts column definitions that I want to reuse as a group. So, I declared a group called AmountsGroup and then used the Group Name as a 'ref' attribute inside my complex type definition. Here is how it looks like
<xsd:complexType name="AmountByCustomerType">
<xsd:sequence>
<xsd:element name="Customer" type="xsd:string" />
<xsd:group ref="AmountsGroup" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="AmountByCustomerTotalType">
<xsd:sequence>
<xsd:element name="Total" type="xsd:string" />
<xsd:group ref="AmountsGroup" />
</xsd:sequence>
</xsd:complexType>
<xsd:group name="AmountsGroup">
<xsd:sequence>
<xsd:element name="AmountByPeriod" type="AmountByPeriodType" maxOccurs="unbounded" />
<xsd:element name="NetAdjustments" type="xsd:decimal" />
<xsd:element name="OriginalSalesAmount" type="xsd:decimal" minOccurs="0"/>
<xsd:element name="RevisedAmount" type="xsd:decimal" />
</xsd:sequence>
</xsd:group>
Here are my questions:
I have declared the group as having maxOccurs="unbounded" in the first complexType where in the second complexType I have left it out meaning it will have to occur only once. Will this work correctly? I want many rows of customer amount and only one total amount row.
The XML instance document will not need to have the name of this group name anywhere - is that correct?
Is there any better way to structure the individual rows and total type of structure?
Is this a good practice when I use Venetian Blinds Pattern? I don't want to declare a complexType since then I have to declare an element which will appear in the XML instance document, thus adding one more level to the XML object tree. Is there any way to use a named Type without giving it an element on its own? I hope you understand what I am trying to do.
Any thoughts?
Correct, maxOccurs applies to the group as a whole.
Correct, group name is in the schema only.
I was going to suggest introducing an element to encapsulate the group members, but I see from your 4th question you're trying to avoid that. I prefer it since it makes it easier for a parser to identify the start and end of each "row" and mirrors programming encapsulation.
Seems reasonable; you're still keeping with the Venetian Blinds spirit of reusable components without committing to a namespace for local elements.

Should we declare a simple type explicitly even for a string type in venetian blinds pattern

I am using the venetian blinds pattern to design my XML schema and it requires that all the types are declared at the global level and all the elements use the types defined in the global scope.
My question is this:
If I want to declare 2 elements which are simple strings with no other restriction, should I declare them in the global scope and then use them? Or can I directly declare a simple type inside the element itself? Am I breaking the venetial blinds in the second scenario I listed below?
For example, I can do one of the two:
<xsd:schema>
<xsd:simpleType name="ApplicantName">
<xsd:restriction base="xsd:string"/>
</xsd:simpleType>
<xsd:simpleType name="ApplicantCountry">
<xsd:restriction base="xsd:string"/>
</xsd:simpleType>
<xsd:element name="Application">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="ApplicantName" type="ApplicantName"/>
<xsd:element name="ApplicantCountry" type="ApplicantCountry"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Or I can use this.
<xsd:schema>
<xsd:element name="Application">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="ApplicantName" type="xsd:string"/>
<xsd:element name="ApplicantCountry" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Well, why did you choose to follow this pattern? Which option provides the benefits that are promised by the pattern? Answer those questions and I think you have your answer.
It seems to me that the pattern calls for the first approach. Whether the pattern actually has value, or whether it should be followed so rigorously is for you to decide. At the heart of the matter is the question of what you are trying to achieve by using the pattern in the first place.
I'd say: It depends. The goal of Venetian Blinds is to reuse types but unless some of your elements share a common restriction like, for example, field length imposed by a backend database you won't gain anything from following this pattern religiously.

Resources