We are doing xml to copybook and vice versa conversions in a middleware system, using IBM Websphere transformation extender. From this link
Cobol to xsd mapping
, we realised that
PIC X(03), in copybook, has to be converted to the below xml schema
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxlength value="3"/>
<xsd:whiteSpace value="preserve"/>
</xsd:restriction>
</xsd:simpletype>
PIC 9(03), in copybook, has to be converted to the below xml schema
<xsd:simpleType>
<xsd:restriction base="xsd:unsignedInt">
<xsd:minInclusive value="0"/>
<xsd:maxInclusive value="999"/>
</xsd:restriction>
</xsd:simpletype>
However, not able to make out, directly, what xml schema to be used for the below copybook types. Could anyone please guide?
PIC S9(17) COMP-3
PIC S9(17)
PIC S9(03)
PIC S9(03) COMP-3
PIC +9(17)
PIC +9(03)
The best way to understand a PICTURE clause, is to consult your COBOL documentation. I'll follow with some "documentation".
One thing to note is that while I consider the XSD snippets below as correct, different tools may not match exactly mine; for sure, whatever you get from your tools, should not be more restrictive than mine.
PIC S9(17) COMP-3
PIC S9(17)
PIC +9(17)
Note: COMP-3 doesn't matter from an XSD perspective; it affects the encoding in the COBOL world.
<xsd:simpleType name="S9-17">
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="-99999999999999999"/>
<xsd:maxInclusive value="99999999999999999"/>
</xsd:restriction>
</xsd:simpleType>
PIC S9(03)
PIC S9(03) COMP-3
PIC +9(03)
<xsd:simpleType name="S9-3">
<xsd:restriction base="xsd:int">
<xsd:minInclusive value="-999"/>
<xsd:maxInclusive value="999"/>
</xsd:restriction>
</xsd:simpleType>
Your PIC 9(03) means the number as unsigned, with an implied positive value.
A preceding S for a numeric value, S9(17) means "signed", with up to 17 decimal digits; the value may be positive or negative. Depending on other clauses, the sign could be separate, leading or trailing.
Things get tricky when a COMPutational clause is present, in which case data is encoded using a "binary" format (half the size, four bits per digit, the sign using the high order, left most bit) - in the COBOL world, not XML. The COMP clause (sometimes referred to as "packed") doesn't change the semantics of the value, it just describes the encoding mechanism, with direct impact on the size (in bytes) required to represent that particular number. For example, a PIC 9(17) will require 17 bytes, a PIC 9(17) COMP-3 will require 9 bytes. Clauses without a COMP are represented in DISPLAY format (basically one byte per decimal digit, plus one for sign, where applicable).
A preceding + sign is much like S; indicates that a number is signed, a + will be used for positive, a - will be used for negative numbers.
Because of this, when representing data in XML, what gets preserved is the data, not it's representation. Consider PIC 9(03) and a value of 1.
A COBOL to XML transform may preserve 001, or not (i.e. get 1).
An XML to COBOL transform must be able to take 001 or 1 and correctly convert it to 001.
Related
Recently I had an example where in a xml message integer fields contained leading zeros. Unfortunately these zeros had relevance. One could argue why in the schema definition integer was chosen. But that is not my question. I was a little surprised leading zeros where allowed at all. So I looked up the specs which of course told me the supertype is decimal. But as expected specification don't really tell you why certain choices where made. So my question is really what is the rationale for allowing leading zeros at all? I mean numbers generally don't have leading zeros.
On a side note I guess the only way to add a restriction on leading zeros is by a pattern.
My recollection is that the XML Schema working group allowed leading zeroes in XSD decimals because they are allowed in normal decimal notation: 1, 01, 001, 0001, etc. all denote the same number in normal numerical notation. (But I don't actually remember that it was discussed at any length, so perhaps this is just my reason for believing it was the right thing to do and other WG members had other reasons for being satisfied with it.)
You are correct to suggest that the root of the problem is the use of xsd:integer as a type for a notation using strings of digits in which leading zeroes are significant (as for example in U.S. zip codes); I think you may be over-generous to say that one could argue about that decision. What possible arguments could one bring forward in favor of such an obviously erroneous choice?
Although numbers often doesn't have leading zeroes, parsing numbers almost always allows leading zeroes.
You don't want to disallow leading zeroes for numbers completely, because you want the option to write a number like 0.12 and not only like .12. As you want to allow at least one leading zero for floating point numbers, it would feel a bit restrictive to only allow one leading zero, and only for floating point numbers.
Sometimes numbers do have leading zeroes, for example the components in a date in ISO8601 format; 2014-05-02. If you want to parse a component it's convenient if the leading zero is allowed, so that you don't have to write extra code to remove it before parsing.
The XML specification just uses the same sets of rules for parsing numbers that is generally used for most formats and in most programming languages.
I have started to create XSD and found in couple of examples for xs:integer and xs:int.
What is the difference between xs:integer and xs:int?
When I should use xs:integer?
When I should use xs:int?
The difference is the following:
xs:int is a signed 32-bit integer.
xs:integer is an integer unbounded value.
See for details https://web.archive.org/web/20151117073716/http://www.w3schools.com/schema/schema_dtypes_numeric.asp
For example, XJC (Java) generates Integer for xs:int and BigInteger for xs:integer.
The bottom line: use xs:int if you want to work cross platforms and be sure that your numbers will pass without a problem.
If you want bigger numbers – use xs:long instead of xs:integer (it will be generated to Long).
The xs:integer type is a restriction of xs:decimal, with the fractionDigits facet set to zero and with a lexical space which forbids the decimal point and trailing zeroes which would otherwise be legal. It has no minimum or maximum value, though implementations running in machines of finite size are not required to be able to accept arbitrarily large or small values. (They are required to support values with 16 decimal digits.)
The xs:int type is a restriction of xs:long, with the maxInclusive facet set to 2147483647 and the minInclusive facet to -2147483648. (As you can see, it will fit conveniently into a two-complement 32-bit signed-integer field; xs:long fits in a 64-bit signed-integer field.)
The usual rule is: use the one that matches what you want to say. If the constraint on an element or attribute is that its value must be an integer, xs:integer says that concisely. If the constraint is that the value must be an integer that can be expressed with at most 32 bits in twos-complement representation, use xs:int. (A secondary but sometimes important concern is whether your tool chain works better with one than with the other. For data that will live longer than your tool chain, it's wise to listen to the data first; for data that exists solely to feed the tool chain, and which will be of no interest if you change your tool chain, there's no reason not to listen to the tool chain.)
I would just add a note of pedantry that may be important to some people: it's not correct to say that xs:int "is" a signed 32-bit integer. That form of words implies an implementation in memory (or registers, etc) within a binary digital computer. XML is character-based and would implement the maximum 32-bit signed value as "2147483647" (my quotes, of course), which is a lot more than 32 bits! What IS true is that xs:int is (indirectly) a restriction of xs:integer which sets the maximum and minimum allowed values to be the same as the corresponding implementation-imposed limits of a 32-bit integer with a sign bit.
This is valid, but duplicates the constraints on the length using both the pattern and the maxLength to enforce it:
<xsd:simpleType name="MyType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="[0-9]{0,10}" />
<xsd:maxLength value="10" />
</xsd:restriction>
</xsd:simpleType>
The pattern alone would suffice:
<xsd:restriction base="xsd:string">
<xsd:pattern value="[0-9]{0,10}" />
</xsd:restriction>
Or the pattern could be simplified and we would rely on maxLength:
<xsd:restriction base="xsd:string">
<xsd:pattern value="[0-9]*" />
<xsd:maxLength value="10" />
</xsd:restriction>
Questions:
Are there known performance implications of choosing one over the other?
Will any given parser check the len first and short circuit the validation before compiling the pattern if both are provied?
Or will both be check in any case?
Does it vary from parser to parser?
I acknowledge that the performance difference here is probably minimal. I also expect that the regex engine may also be able to short circuit of the is a length constraint--but that's a level deeper than I probably care about.
Performance aside, I think i prefer having it all in the pattern, but that may relect my comfort level with regex rather than a typical best practice.
Thanks!
Is your code meant to be a number or numeric string? By that I mean are leading zeros allowed? If they aren't you could make your datatype even simpler by making it a restriction of xsd:integer with a maximum length or a maximum value such as either:
<xsd:restriction base="xsd:integer">
<xsd:maxLength value="10" />
</xsd:restriction>
or
<xsd:restriction base="xsd:integer">
<xsd:maxExclusive value="10000000000"/>
</xsd:restriction>
That going to be the simplest way of describing it, as well as probably faster as you are now doing an integer check, instead of a regex check.
In a way, your question has a funny side too... Since even you did a mistake in translating the last pattern (+ means one ore more, you wanted * instead), it proves a point that some will say about regular expressions, an that is regex could prove tricky. Regex is a struggle for many, whether we like it or not.
I am a firm believer in "Make things as simple as possible, but not simpler".
If you can without regex, stay away from it (see above). Reference as much as you can on built in types, and the provided facets (I would think the only valid case for you is if you wanted to allow for leading zeroes, otherwise an unsignedint with constraining facets would do the same).
If you can't, but regex could do it, don't hesitate to use it.
Never duplicate your "requirements" - maintenance is the most important reason. Undeniably there's a chance for extra CPU cycles, but unless someone is overdoing it intentionally, as you said it, the overhead is most likely minimal.
I think that if you stick with these principles, your questions kind of go away...
I am trying to design an XML structure to capture the output from a spreadsheet which contains a Customer Name and many different amount columns. And there is a total row as well.
I have about 4 amounts column definitions that I want to reuse as a group. So, I declared a group called AmountsGroup and then used the Group Name as a 'ref' attribute inside my complex type definition. Here is how it looks like
<xsd:complexType name="AmountByCustomerType">
<xsd:sequence>
<xsd:element name="Customer" type="xsd:string" />
<xsd:group ref="AmountsGroup" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="AmountByCustomerTotalType">
<xsd:sequence>
<xsd:element name="Total" type="xsd:string" />
<xsd:group ref="AmountsGroup" />
</xsd:sequence>
</xsd:complexType>
<xsd:group name="AmountsGroup">
<xsd:sequence>
<xsd:element name="AmountByPeriod" type="AmountByPeriodType" maxOccurs="unbounded" />
<xsd:element name="NetAdjustments" type="xsd:decimal" />
<xsd:element name="OriginalSalesAmount" type="xsd:decimal" minOccurs="0"/>
<xsd:element name="RevisedAmount" type="xsd:decimal" />
</xsd:sequence>
</xsd:group>
Here are my questions:
I have declared the group as having maxOccurs="unbounded" in the first complexType where in the second complexType I have left it out meaning it will have to occur only once. Will this work correctly? I want many rows of customer amount and only one total amount row.
The XML instance document will not need to have the name of this group name anywhere - is that correct?
Is there any better way to structure the individual rows and total type of structure?
Is this a good practice when I use Venetian Blinds Pattern? I don't want to declare a complexType since then I have to declare an element which will appear in the XML instance document, thus adding one more level to the XML object tree. Is there any way to use a named Type without giving it an element on its own? I hope you understand what I am trying to do.
Any thoughts?
Correct, maxOccurs applies to the group as a whole.
Correct, group name is in the schema only.
I was going to suggest introducing an element to encapsulate the group members, but I see from your 4th question you're trying to avoid that. I prefer it since it makes it easier for a parser to identify the start and end of each "row" and mirrors programming encapsulation.
Seems reasonable; you're still keeping with the Venetian Blinds spirit of reusable components without committing to a namespace for local elements.
I am using the venetian blinds pattern to design my XML schema and it requires that all the types are declared at the global level and all the elements use the types defined in the global scope.
My question is this:
If I want to declare 2 elements which are simple strings with no other restriction, should I declare them in the global scope and then use them? Or can I directly declare a simple type inside the element itself? Am I breaking the venetial blinds in the second scenario I listed below?
For example, I can do one of the two:
<xsd:schema>
<xsd:simpleType name="ApplicantName">
<xsd:restriction base="xsd:string"/>
</xsd:simpleType>
<xsd:simpleType name="ApplicantCountry">
<xsd:restriction base="xsd:string"/>
</xsd:simpleType>
<xsd:element name="Application">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="ApplicantName" type="ApplicantName"/>
<xsd:element name="ApplicantCountry" type="ApplicantCountry"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Or I can use this.
<xsd:schema>
<xsd:element name="Application">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="ApplicantName" type="xsd:string"/>
<xsd:element name="ApplicantCountry" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Well, why did you choose to follow this pattern? Which option provides the benefits that are promised by the pattern? Answer those questions and I think you have your answer.
It seems to me that the pattern calls for the first approach. Whether the pattern actually has value, or whether it should be followed so rigorously is for you to decide. At the heart of the matter is the question of what you are trying to achieve by using the pattern in the first place.
I'd say: It depends. The goal of Venetian Blinds is to reuse types but unless some of your elements share a common restriction like, for example, field length imposed by a backend database you won't gain anything from following this pattern religiously.