XSD with no order and selective restriction - xsd

I need to validate an XML that contains element in random order and some of them must exist and some of them only once. BTW some elements can be nested recursively.
For example there is a room that should contain one door and any number of boxes and elements. Boxes Can contain other boxes or/and elements.
Example XML:
<Room>
<Element />
<Box>
<Box>
<Element />
<Box></Box>
<Element />
</Box>
<Element />
</Box>
<Door />
<Element />
</Room>
This example is very simple, but in my case there are a lot of elements that can be in <Room>. Recursion is not a problem. The problem is to make <Door> to be required and in any order with siblings that are not required.
UPD: the question is about XSD 1.0 because I use .NET and there are no free lib for XSD 1.1

From what i'm reading i think you might need to use schema (XSD) indicators.
Check following link for more information: Schema indicators
Random Order
From your question:
I need to validate an XML that contains element in random order
Possible answer:
Using the All indicator (see link) you can specify that the element are in random order.
All Indicator
The indicator specifies that the child elements can appear in any order, and that each child element must occur only once:
Occurances
From your question:
some of them must exist and some of then once
Possible answer:
If i'm understanding it correctly you want to specify the amount of times an element exist or can be used. This is called occurance and again can be found back on the following link. You'll have to determine minOccurs and maxOccurs following your requirements.
Occurrence Indicators
Occurrence indicators are used to define how often an element can occur.
The "maxOccurs" indicator specifies the maximum number of times an element can occur:
The "minOccurs" indicator specifies the minimum number of times an element can occur:
Everything including examples can be found back on the XSD/Schema indicators.
How your XSD (xml schema) will probably look like:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Room" type="Room_T"/>
<xs:complexType name="Room_T">
<xs:all>
<xs:element name="Element" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="Box" type="Box_T" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="Door" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
</xs:all>
</xs:complexType>
<xs:complexType name="Box_T">
<xs:all>
<xs:element name="Element" type="xs:string" minOccurs="1" maxOccurs="unbounded"/>
<xs:element name="Box" type="Box_T" minOccurs="0" maxOccurs="1"/>
</xs:all>
</xs:complexType>
</xs:schema>
I didn't check if the code above is valid but i think it could definitely get you started!

Related

XSD variable element names

Is it possible in XSD to name an element dynamically?
I have a complexType with a varying (but limited) number of elements (x-n), each of which has a complicated substructure. I can copy and paste one (x-1) and just change the number for the name of each of the copies (x-2, x3, and so on), but it'd be cleaner if I didn't have to.
For example, as it is now:
<xs:schema attributeFormDefault="unq1" elementFormDefault="q1" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="r1">
<xs:complexType><xs:sequence>
<xs:element name="w-s">
<xs:complexType><xs:sequence maxOccurs="4">
<xs:element name="x-1" minOccurs="0">
<xs:complexType><xs:sequence>
<!-- long tedious substructure goes here -->
</xs:sequence></xs:complexType>
</xs:element>
<xs:element name="x-2" minOccurs="0">
<xs:complexType><xs:sequence>
<!-- long tedious substructure goes here -->
</xs:sequence></xs:complexType>
</xs:element>
<xs:element name="x-3" minOccurs="0">
<xs:complexType><xs:sequence>
<!-- long tedious substructure goes here -->
</xs:sequence></xs:complexType>
</xs:element>
<xs:element name="x-4" minOccurs="0">
<xs:complexType><xs:sequence>
<!-- long tedious substructure goes here -->
</xs:sequence></xs:complexType>
</xs:element>
</xs:sequence></xs:complexType>
</xs:element>
</xs:sequence></xs:complexType>
</xs:element>
</xs:schema>
Looking through w3schools now (https://www.w3schools.blog/xsd-xml-schema-definition-tutorial), the answer is not jumping out at me yet, and it's starting to look like the answer is "No.".
I can copy and paste one (x-1) and just change the number
If each element x-n has the same content then you should use the same tag name for all of these tags. That will require a change in the XML format, so I would also recommend that you stop using this style:
<element name='x-n' ...>
...and start using this instead:
<x index="n">
This will make your life much easier because XML schema expects the tag name to indicate the type of content.
I understand that you may not be able to change the XML format, but I think it's important to point out that your current style of XML is not best practice.
Judging by these stackoverflows, the answer appears to be "Not only no, but also you're wrong for asking" :-p
xml schema list of incremental element name
Can't design xsd schema - because of a variable element name
XML variable tag names
The better solution is to restructure the XSD so that the x-n element has a sub-element indicating the value of n, like so:
<xs:schema attributeFormDefault="unq1" elementFormDefault="q1" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="r1">
<xs:complexType><xs:sequence>
<xs:element name="w-s">
<xs:complexType><xs:sequence maxOccurs="4">
<xs:element name="x" minOccurs="0">
<xs:complexType><xs:sequence>
<xs:element name="x-number">
<!-- long tedious substructure goes here -->
</xs:sequence></xs:complexType>
</xs:element>
</xs:sequence></xs:complexType>
</xs:element>
</xs:sequence></xs:complexType>
</xs:element>
</xs:schema>

DFDL Schema for parsing delimited text message

Need small help for DFDL. I need to parse below message as something like XML/tree structure. Elements are not fixed and dynamic. Sometime some other elements will appear.
XML/Tree output expected as something below
<root>
<CLIENT_ID>DESKTOPCLIENT</CLIENT_ID>
<LOCALE>en-US</LOCALE>
<ENCODE/>
</root>
Something like this is a possible solution, tested in Daffodil:
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">
<xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format
ref="GeneralFormat"
lengthKind="delimited"
/>
</xs:appinfo>
</xs:annotation>
<xs:element name="root" dfdl:initiator="%ESC;" dfdl:terminator="%SUB;">
<xs:complexType>
<xs:sequence dfdl:separator="%CAN;" dfdl:separatorPosition="prefix" dfdl:sequenceKind="unordered">
<xs:element name="CLIENT_ID" type="xs:string" dfdl:initiator="CLIENT_ID%NAK;" />
<xs:element name="LOCALED" type="xs:string" dfdl:initiator="LOCALE%NAK;" />
<xs:element name="ENCODE" type="xs:string" dfdl:initiator="ENCODE%NAK;" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Note that this assumes fixed names for the individual elements, and that they all exist, though the order does not matter. If you know the fixed names, but they may or may not exist, you can add minOccurs="0" to the elements in the unorderd sequence.
However, DFDL does not allow for dynami element names, so if you don't know the names, you need a slightly different schema. Instead, you need to describe the data as an unbouned number of name/value pairs, where the name and value are separated by %NAK;, for example:
<xs:element name="root" dfdl:initiator="%ESC;" dfdl:terminator="%SUB;">
<xs:complexType>
<xs:sequence dfdl:separator="%CAN;" dfdl:separatorPosition="prefix">
<xs:element name="element" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence dfdl:separator="%NAK;" dfdl:separatorPosition="infix">
<xs:element name="name" type="xs:string" />
<xs:element name="value" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
This results in an infoset that looks something like this:
<root>
<element>
<name>CLIENT_ID</name>
<value>DESKTOPCLIENT</value>
</element>
<element>
<name>LOCALE</name>
<value>en-US</value>
</element>
<element>
<name>ENCODE</name>
<value></value>
</element>
</root>
If you need the XML tags to match the name fields like in your question, you would then need to transform the infoset. XSLT can do this kind of transformation without much trouble.
Edit: There seems to be an issue where IBM DFDL does not like the above solution. I'm not sure why, but it works with Apache Daffodil. Something about value being the empty string causes an issue. After some trial and error, I've found that IBM DFDL (and Apache Daffodil too) are okay with it if you specify that empty value elements should be treated as nil. So changing the value element to this works:
<xs:element name="value" type="xs:string" nillable="true"
dfdl:nilKind="literalValue" dfdl:nilValue="%ES;"
dfdl:useNilForDefault="no"/>
In that case, the infoset ends up with something like this:
<element>
<name>ENCODE</name>
<value xsi:nil="true"></value>
</element>
Edit2: The nillable properties are required because otherwise IBM DFDL treats an empty string value as absent rather than having an empty value. Being absent results in the error. Newer versions of the DFDL spec add a new property, emptyElementParsePolicy, which lets you control whether or not empty strings are treated as absent or are just treated as an empty string. Daffodil implements this property as an extensions, but defaults to the treat as empty behavior. IBM DFDL has the treat as absent behavior. Daffodil has a similar behavior to IBM DFDL when setting this property to treat as absent.

xml schema maxOccurs = unbounded within xs:all

Is it possible to have a combination of xs:all and xs:sequence?
I've have a xml structure with an element probenode which consist of the elements name, id, url, tags, priority, statuws_raw, active. And a combination of device and group.
device and group can occur zero or more times...
the solution below doesn't work because it is not allowed to use unbounded for an element. within an all group.
<xs:complexType name="probenodetype">
<xs:all>
<xs:element name="name" type="xs:string" />
<xs:element name="id" type="xs:unsignedInt" />
<xs:element name="url" type="xs:string" />
<xs:element name="tags" />
<xs:element name="priority" type="xs:unsignedInt" />
<xs:element name="status_raw" type="xs:unsignedInt" />
<xs:element name="active" type="xs:boolean" />
<xs:element name="device" type="devicetype" minOccurs="0" maxOccurs="unbounded">
<!-- zie devicetype -->
</xs:element>
<xs:element name="group" type="grouptype" minOccurs="0" maxOccurs="unbounded">
<!-- zie grouptype -->
</xs:element>
</xs:all>
<xs:attribute name="noaccess" type="xs:integer" use="optional" />
</xs:complexType>
In XSD 1.0, the children of xs:all must have maxOccurs set to 1.
In XSD 1.1 this constraint is lifted.
So your alternatives appear to be:
Use an XSD 1.1 processor (Saxon or Xerces-J).
Use XSD 1.0 and impose an order on the children of probenodetype. This is a problem if the order in which the children appear carries information (so id followed by url is different from url followed by id ...).
In some simple cases it's feasible to write a content model that accepts precisely what you suggest you want, using only choice and sequence, but with seven required elements the resulting content model is likely to be too long and complex to be useful.
At this point some users give up and write a complex type with a repeatable OR-group and move the responsibility for checking that name, id, url, etc. all occur at least once and at most once into the application; that allows the generator of the XML not to have to worry about a fixed order (and opens a side channel for information leakage, which matters to some people) but also renders the schema somewhat less useful as documentation of the contract between data provider and data consumer.

How to make a schema for an unordered list where some occur once, some many times

This is a similar question to How to create a schema for an unordered list of XML nodes, with occurrence constraints, but actually slightly simpler. However I am having great trouble understanding the logic behind sequences and choices (and especially when they are nested into sequences of choices or choices of sequences), and although I've studied it for a long time I can't understand how the example above works.
What I need is schema for a list of nodes, ChildA and ChildB, whereby ChildA can occur 0-n times, but ChildB only 0-1 times. (Actually I need several nodes of each type, but if I can do it for ChildA and ChildB, extending it to ChildX etc. and ChildY etc. should be simple). There should be no order constraint. I'd appreciate any help. Any links that explain schema indicators in depth would also be helpful.
This would be the simplest solution that quickly came to my mind. The key point here is to use another sequence inside the "main" sequence. The schema is kept deterministic by setting the inner sequence to start with <ChildB> and <ChildB> is kept optional by setting the cardinality of that sequence to 0-1.
This is an XMLSchema 1.0 solution.
<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- Schema for elements ChildA and ChildB
The requirements are as follows:
* ChildA and ChildB may occur in any order.
* ChildA is optional and may occur multiple times.
* ChildB is optional and may occur once only.
-->
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" name="AB-container" type="AB-type" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="AB-type">
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" name="ChildA" type="xs:string" />
<xs:sequence minOccurs="0">
<xs:element name="ChildB" type="xs:string" />
<xs:element minOccurs="0" maxOccurs="unbounded" name="ChildA" type="xs:string" />
</xs:sequence>
</xs:sequence>
</xs:complexType>
</xs:schema>
Short answer is that it cannot be done in XSD 1.0; in XSD 1.1 you could use an xsd:all compositor (since the restriction from XSD 1.0 of having only maxOccurs = 1 has been removed) - however, XSD 1.1 's problems are that i) it is only available freely as a beta Xerces version - as far as I know, and at this time; ii) there's a SAXON edition supporting it, last time I saw references to it you would have to pay for that and iii) you would have a hard time interoperating with other folks since most of them are still on XSD 1.0.
IF you can use Schematron - definitely more accessible than XSD 1.1 since it is just XSLT 1.0/2.0, then it should be easy to code it such that the count of particular element particles meets a specified criteria; it would augment an XSD where the compositor would be a repeating xsd:choice, where the choice options are elements from your allowed set.
Some people try to explain XSD compositors by making a parallel with constructs from regular expressions. If you are familiar with that, then xsd:all in XSD 1.0 is similar to square brackets (much simpler though, no concept of a range or negation), xsd:choice is like | (pipe, alternation) and xsd:sequence is the rest (where it matters the order in which you write your stuff).
I see that other people on SO recommend W3Schools. I didn't try it myself, hence me passing this on to you with the disclaimer.
#Dave is adding some dumb attribute to ChildB okay? Since your requirement on childB is 0-1 we can achieve the desired solution by adding a fixed attribute to childB and enforcing unique constraint on the attribute.
<complexType name="childAType">
<simpleContent>
<extension base="string"></extension>
</simpleContent>
</complexType>
<complexType name="childBType">
<simpleContent>
<extension base="string">
<attribute name="value" type="string" fixed="123"></attribute>
</extension>
</simpleContent>
</complexType>
<element name="root">
<complexType>
<choice minOccurs="0" maxOccurs="unbounded">
<element name="childA" type="tns:childAType" minOccurs="0" maxOccurs="unbounded"></element>
<element name="childB" type="tns:childBType" minOccurs="0" maxOccurs="unbounded"></element>
</choice>
</complexType>
<unique name="childB.max.once">
<selector xpath="./tns:childB"></selector>
<field xpath="#value"></field>
</unique>
</element>
Below is one of valid XML (order of B doesn't matter or B can be excluded)
<tns:root xmlns:tns=" ">
<tns:childA></tns:childA>
<tns:childB></tns:childB>
<tns:childA></tns:childA>
<tns:childA></tns:childA>
</tns:root>
However the below one is invalid
<tns:root xmlns:tns=" ">
<tns:childB></tns:childB>
<tns:childA></tns:childA>
<tns:childB></tns:childB>
<tns:childA></tns:childA>
<tns:childA></tns:childA>
</tns:root>

minOccurs/maxOccurs in XML Schema

Given this XML Schema snippet:
<xs:element name="data">
<xs:complexType>
<xs:sequence>
<xs:element name="param" type="param" minOccurs="0" maxOccurs="unbounded" />
<xs:element name="format" type="format" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="name" type="xs:string" />
</xs:complexType>
</xs:element>
The intended result is valid <data> elements may contain 0 or more <param> elements followed by 0 or more <format> elements. Have I added the minOccurs/maxOccurs atttributes correctly, or should they be applied to the containing <xs:sequence>?
Correct or not, what would be the result of going one way or the other?
You have done it right and you can not add min/max occurs to sequence element. Using and XML editor that supports XML Schema might help you to validate your assumptions when you are in doubt. Here is a good free ware called XMLFox

Resources