Difference between group and sequence in XML Schema? - xsd

What is the difference between an xs:group and an xs:sequence in XML Schema? When would you use one or the other?

xs:sequence - together with xs:choice and xs:all - is used to define the valid sequences of XML element in the target XML. E.g. the schema for this XML:
<mainElement>
<firstSubElement/>
<subElementA/>
<subElementB/>
</mainElement>
is something like:
<xs:element name='mainElement'>
<xs:complexType>
<xs:sequence>
<xs:element name="firstSubElement"/>
<xs:element name="subElementA"/>
<xs:element name="subElementB"/>
</xs:sequence>
</xs:complexType>
</xs:element>
xs:group is used to define a named group of XML element following certain rules that can then be referenced in different parts of the schema. For example if the XML is:
<root>
<mainElementA>
<firstSubElement/>
<subElementA/>
<subElementB/>
</mainElementA>
<mainElementB>
<otherSubElement/>
<subElementA/>
<subElementB/>
</mainElementB>
</root>
you can define a group for the common sub-elements:
<xs:group name="subElements">
<xs:sequence>
<xs:element name="subElementA"/>
<xs:element name="subElementB"/>
</xs:sequence>
</xs:group>
and then use it:
<xs:element name="mainElementA">
<xs:complexType>
<xs:sequence>
<xs:element name="firstSubElement"/>
<xs:group ref="subElements"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="mainElementB">
<xs:complexType>
<xs:sequence>
<xs:element name="otherSubElement"/>
<xs:group ref="subElements"/>
</xs:sequence>
</xs:complexType>
</xs:element>

Related

Using XSD in PySpark

I am building a datawarehouse in Azure Synapse where one of the sources are about 20 different types of XML files (with a different XSD scheme) and 1 base scheme.
What I am looking for is to get all XML elements and store them in files (1 per type) in my data lake. For that I need to have unique names per element, for example the whole path as a name. I tried to define dicts per type with all element names, but this is quite some work. To automate this (XSDs are updated yearly), I tried to code this out in Excel and VBA, but the XSDs are quite complex with nested complex types etc.
Below is a snippet of the baseschema.xsd:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="http://www.website.org/typ/1/baseschema/schema" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:iwmo="http://www.website.org/typ/1/baseschema/schema">
<xs:complexType name="Complex_Address">
...
<xs:sequence>
<xs:element name="Home" type="iwmo:Complex_House" minOccurs="0">
...
</xs:element>
<xs:element name="Postalcode" type="iwmo:Simple_Postalcode" minOccurs="0">
...
</xs:element>
<xs:element name="Streetname" type="iwmo:Simple_Streetname" minOccurs="0">
...
</xs:element>
<xs:element name="Areaname" type="iwmo:Simple_Areaname" minOccurs="0">
...
</xs:element>
<xs:element name="CountryCode" type="iwmo:Simple_CountryCode" minOccurs="0">
...
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Complex_House">
...
<xs:sequence>
<xs:element name="Housenumber" type="iwmo:Simple_Housenumber">
...
</xs:element>
<xs:element name="Houseletter" type="iwmo:Simple_Houseletter" minOccurs="0">
...
</xs:element>
<xs:element name="HousenumberAddition" type="iwmo:Simple_HousenumberAddition" minOccurs="0">
...
</xs:element>
<xs:element name="IndicationAddress" type="iwmo:Simple_IndicationAddress" minOccurs="0">
...
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Complex_MessageIdentification">
...
<xs:sequence>
<xs:element name="Identification" type="iwmo:Simple_IdentificationMessage">
...
</xs:element>
<xs:element name="Date" type="iwmo:Simple_Date">
...
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Complex_Product">
...
<xs:sequence>
<xs:element name="Categorie" type="iwmo:Simple_ProductCategory">
...
</xs:element>
<xs:element name="Code" type="iwmo:Simple_ProductCode" minOccurs="0">
...
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Complex_XsdVersion">
<xs:sequence>
<xs:element name="BaseschemaXsdVersion" type="iwmo:Simple_Version">
</xs:element>
<xs:element name="MessageXsdVersion" type="iwmo:Simple_Version">
</xs:element>
</xs:sequence>
</xs:complexType>
And here a snippet of the xsd of 1 of the message types:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:typ="http://www.website.org/typ/1/baseschema/schema" xmlns:type1="http://www.website.org/typ/1/type1/schema" targetNamespace="http://www.website.org/typ/1/type1/schema" elementFormDefault="qualified">
<xs:import namespace="http://www.website.org/typ/1/baseschema/schema" schemaLocation="baseschema.xsd"></xs:import>
<xs:element name="Message" type="type1:Root"></xs:element>
<xs:complexType name="Root">
...
<xs:sequence>
<xs:element name="Header" type="type1:Header"></xs:element>
<xs:element name="Client" type="type1:Client"></xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Header">
<xs:sequence>
<xs:element name="Person" type="typ:Simple_SpecialCode">
...
</xs:element>
<xs:element name="MessageIdentification" type="typ:Complex_MessageIdentification">
...
</xs:element>
<xs:element name="XsdVersion" type="typ:Complex_XsdVersion">
...
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Client">
...
<xs:sequence>
<xs:element name="AssignedProducts" type="type1:AssignedProducts"></xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="AssignedProducts">
<xs:sequence>
<xs:element name="AssignedProduct" type="type1:AssignedProduct" maxOccurs="unbounded"></xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="AssignedProduct">
...
<xs:sequence>
<xs:element name="ToewijzingNummer" type="typ:Simple_Nummer">
...
</xs:element>
<xs:element name="Product" type="typ:Complex_Product" minOccurs="0">
...
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:schema>
Then this would be the desired output:
Header_Person
Header_MessageIdentification_Identification
Header_MessageIdentification_Date
Header_XsdVersion_BaseschemaXsdVersion
Header_XsdVersion_MessageXsdVersion
Client_AssignedProduct_ToewijzingNummer
Client_AssignedProduct_Product_Category
Client_AssignedProduct_Product_Code
In the baseschema I also added a nested complex type, to show the complexity.
Is there some kind of package or something in Python that can help me achieve this? Also a tool that can just create this list of elements in a text file would be great, I then can easily copy that into a variable.
I'm not sure if I'm clear about my requirements, if this is posted in the correct group with the correct tags, but I hope someone can point me into a good solution.
Ronald
I found a workaround after all where I put all fields from the xsds in variables. It's not ideal, but any other way would be too complex.

XSD: Divide scheme using a choice of sequences

A part of my xsd looks as follows:
<xs:element name="my_element" minOccurs="1 maxOccurs="unbounded">
<xs:complexType>
<xs:choice>
<xs:sequence>
<xs:element name="sequence_1" type="xs:string"/>
<xs:element name="ID1" type="xs:string"/>
<xs:element name="TYPE1" type="xs:string"/>
</xs:sequence>
<xs:sequence>
<xs:element name="sequence_2" type="xs:string"/>
<xs:element name="ID2" type="xs:string"/>
<xs:element name="TYPE2" type="xs:string"/>
</xs:sequence>
</xs:choice>
</xs:complexType>
</xs:element>
The first element name of the sequence decides about th following nodes.
If I now have a lot of different sequences with some elements inside my xsd doesn't look very clear.
Is it possible to separate the sequences (like I can do it for complexType)?
You can use group :
<xs:group name="seqGroup_x">
<xs:sequence>
<xs:element name="sequence_x" type="xs:string"/>
<xs:element name="ID" type="xs:string"/>
...
</xs:sequence>
</xs:group>
<xs:complexType name="yourType">
<xs:group ref="seqGroup_x"/>
<xs:attribute name="anotherattr" type="xs:string"/>
</xs:complexType>

xsd: define an element that can be repeated an even number of times

with xsd, is there a way in which we can define an element that is repeated an even number of times? with my information it is not possible with the attributes minOcurs and maxOccurs.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" >
<xs:element name="A">
<xs:complexType>
<xs:sequence>
<xs:element name="B" maxOccurs="?" minOccurs="?"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
How about this:
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:element name="B" maxOccurs="2" minOccurs="2"/>
</xs:sequence>

XSD Schema - how to ensure that two simple elements either have values or do not have values together

please assist, this is what i want to achieve in validating my xml file:
<?xml version="1.0" encoding="UTF-8"?>
<worker>
<name>dingo</name>
<ssn>12345</ssn>
</worker>
I want to ensure that the two simple elements 'name' and 'ssn' either have values (as a group) or do not have any value (as a group). They cannot exist individually with a value.
I have to use an XSD schema, so cannot use other options i see suggestions sometimes: Relax NG etc.
I looked into creating a group for elements 'name' and 'ssn' but i am unable to find out how to create a restriction for this group to obtain my condition.
My current XSD file:
<xs:complexType name="worker">
<xs:sequence>
<xs:element name="name" type="xs:string" minOccurs="0" "maxOccurs="1">
<xs:element name="ssn" type="xs:positiveInteger" minOccurs="0" "maxOccurs="1">
</xs:sequence>
</xs:complexType>
<xs:complexType name="worker">
<xs:sequence minOccurs="0">
<xs:element name="name" type="xs:string">
<xs:element name="ssn" type="xs:positiveInteger">
</xs:sequence>
</xs:complexType>
You have to do
<xs:complexType name="worker">
<xs:group ref="workerGrp" minOccurs="0"/>
</xs:complexType>
<xs:group name="workerGrp">
<xs:sequence>
<xs:element name="name" type="xs:string">
<xs:element name="ssn" type="xs:positiveInteger">
</xs:sequence>
</xs:group>

JAXB customize bindings - skip generated classes from schema

I have a following schema:
<xs:element name="Invoice">
<xs:complexType>
<xs:sequence>
.....
<xs:element name="InvoiceLines" type="InvoiceLinesType">
</xs:element>
.....
</xs:complexType>
</xs:element>
<xs:complexType name="InvoiceLinesType">
<xs:sequence>
<xs:element maxOccurs="unbounded" name="InvoiceLine" type="InvoiceLineType">
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="InvoiceLineType">
<xs:sequence>
.....
</xs:sequence>
</xs:complexType>
The problem is, that it generate classes:
Invoice - which contain member of InvoiceLinesType
InvoiceLinesType - which contain a collection of InvoiceLineType
InvoiceLineType
So there is one unnecessary class (InvoiceLinesType) and i prefer the following
Invoice - which contain a collection of InvoiceLineType
InvoiceLineType
Does anyone know how to tell the compiler not to generate this package (InvoiceLinesType).
My current external binding file is there
<jxb:bindings schemaLocation="invoice.xsd" node="/xs:schema">
<jxb:globalBindings>
.....
<xjc:simple/>
.....
</jxb:globalBindings>
</jxb:bindings>
Thank You for response.
You would have to modify your schema - drop InvoiceLinesType and have InvoiceLineType as unbounded element in Invoice.
<xs:element name="Invoice">
<xs:complexType>
<xs:sequence>
.....
<xs:element maxOccurs="unbounded" name="InvoiceLine" type="InvoiceLineType">
</xs:element>
.....
</xs:complexType>
</xs:element>
<xs:complexType name="InvoiceLineType">
<xs:sequence>
.....
</xs:sequence>
</xs:complexType>

Resources