I am building a datawarehouse in Azure Synapse where one of the sources are about 20 different types of XML files (with a different XSD scheme) and 1 base scheme.
What I am looking for is to get all XML elements and store them in files (1 per type) in my data lake. For that I need to have unique names per element, for example the whole path as a name. I tried to define dicts per type with all element names, but this is quite some work. To automate this (XSDs are updated yearly), I tried to code this out in Excel and VBA, but the XSDs are quite complex with nested complex types etc.
Below is a snippet of the baseschema.xsd:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="http://www.website.org/typ/1/baseschema/schema" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:iwmo="http://www.website.org/typ/1/baseschema/schema">
<xs:complexType name="Complex_Address">
...
<xs:sequence>
<xs:element name="Home" type="iwmo:Complex_House" minOccurs="0">
...
</xs:element>
<xs:element name="Postalcode" type="iwmo:Simple_Postalcode" minOccurs="0">
...
</xs:element>
<xs:element name="Streetname" type="iwmo:Simple_Streetname" minOccurs="0">
...
</xs:element>
<xs:element name="Areaname" type="iwmo:Simple_Areaname" minOccurs="0">
...
</xs:element>
<xs:element name="CountryCode" type="iwmo:Simple_CountryCode" minOccurs="0">
...
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Complex_House">
...
<xs:sequence>
<xs:element name="Housenumber" type="iwmo:Simple_Housenumber">
...
</xs:element>
<xs:element name="Houseletter" type="iwmo:Simple_Houseletter" minOccurs="0">
...
</xs:element>
<xs:element name="HousenumberAddition" type="iwmo:Simple_HousenumberAddition" minOccurs="0">
...
</xs:element>
<xs:element name="IndicationAddress" type="iwmo:Simple_IndicationAddress" minOccurs="0">
...
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Complex_MessageIdentification">
...
<xs:sequence>
<xs:element name="Identification" type="iwmo:Simple_IdentificationMessage">
...
</xs:element>
<xs:element name="Date" type="iwmo:Simple_Date">
...
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Complex_Product">
...
<xs:sequence>
<xs:element name="Categorie" type="iwmo:Simple_ProductCategory">
...
</xs:element>
<xs:element name="Code" type="iwmo:Simple_ProductCode" minOccurs="0">
...
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Complex_XsdVersion">
<xs:sequence>
<xs:element name="BaseschemaXsdVersion" type="iwmo:Simple_Version">
</xs:element>
<xs:element name="MessageXsdVersion" type="iwmo:Simple_Version">
</xs:element>
</xs:sequence>
</xs:complexType>
And here a snippet of the xsd of 1 of the message types:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:typ="http://www.website.org/typ/1/baseschema/schema" xmlns:type1="http://www.website.org/typ/1/type1/schema" targetNamespace="http://www.website.org/typ/1/type1/schema" elementFormDefault="qualified">
<xs:import namespace="http://www.website.org/typ/1/baseschema/schema" schemaLocation="baseschema.xsd"></xs:import>
<xs:element name="Message" type="type1:Root"></xs:element>
<xs:complexType name="Root">
...
<xs:sequence>
<xs:element name="Header" type="type1:Header"></xs:element>
<xs:element name="Client" type="type1:Client"></xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Header">
<xs:sequence>
<xs:element name="Person" type="typ:Simple_SpecialCode">
...
</xs:element>
<xs:element name="MessageIdentification" type="typ:Complex_MessageIdentification">
...
</xs:element>
<xs:element name="XsdVersion" type="typ:Complex_XsdVersion">
...
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Client">
...
<xs:sequence>
<xs:element name="AssignedProducts" type="type1:AssignedProducts"></xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="AssignedProducts">
<xs:sequence>
<xs:element name="AssignedProduct" type="type1:AssignedProduct" maxOccurs="unbounded"></xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="AssignedProduct">
...
<xs:sequence>
<xs:element name="ToewijzingNummer" type="typ:Simple_Nummer">
...
</xs:element>
<xs:element name="Product" type="typ:Complex_Product" minOccurs="0">
...
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:schema>
Then this would be the desired output:
Header_Person
Header_MessageIdentification_Identification
Header_MessageIdentification_Date
Header_XsdVersion_BaseschemaXsdVersion
Header_XsdVersion_MessageXsdVersion
Client_AssignedProduct_ToewijzingNummer
Client_AssignedProduct_Product_Category
Client_AssignedProduct_Product_Code
In the baseschema I also added a nested complex type, to show the complexity.
Is there some kind of package or something in Python that can help me achieve this? Also a tool that can just create this list of elements in a text file would be great, I then can easily copy that into a variable.
I'm not sure if I'm clear about my requirements, if this is posted in the correct group with the correct tags, but I hope someone can point me into a good solution.
Ronald
I found a workaround after all where I put all fields from the xsds in variables. It's not ideal, but any other way would be too complex.
Related
All, I have an XML doc which I don't control for which I need to create an xsd to validate. The XML doc has multiple transaction types, some of which are required a specific number of times, and some aren't. the parent element is simply <transaction>, the child element can be either a <ControlTransaction> or a <RetailTransaction>. The issue is that I need to require a <transaction> to exists with a <ControlTransaction> with a <ReasonCode> element having a value of "Register Open" and another with a value of "Register Close" as follows:
<?xml version="1.0" encoding="UTF-8"?>
<RegisterDay xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:cp="urn:register">
<Transaction>
<SequenceNumber>1</SequenceNumber>
<ControlTransaction>
<ReasonCode>Register Open</ReasonCode>
</ControlTransaction>
</Transaction>
<Transaction>
<SequenceNumber>2</SequenceNumber>
<RetailTransaction>
...stuff..
<Total>9.99</Total>
</RetailTransaction>
</Transaction>
<Transaction>
<SequenceNumber>3</SequenceNumber>
<ControlTransaction>
<ReasonCode>Register Close</ReasonCode>
</ControlTransaction>
</Transaction>
</RegisterDay>
My best attempt is to use types in my schema, but get "Elements with the same name and same scope must have the same type". I don't know how to get around this.
<?xml version="1.0"?>
<xs:schema
xmlns:cp="urn:register"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
attributeFormDefault="unqualified"
elementFormDefault="qualified">
<xs:element name="RegisterDay">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="1" maxOccurs="1" name="Transaction" type="TransactionRegisterOpen_type"/>
<xs:element minOccurs="1" maxOccurs="unbounded" name="Transaction" type="RetailTransaction_type"/>
<xs:element minOccurs="1" maxOccurs="1" name="Transaction" type="TransactionRegisterClose_type"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:simpleType name="RegisterOpen_type">
<xs:restriction base="xs:string">
<xs:pattern value="Register Open"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="RegisterClose_type">
<xs:restriction base="xs:string">
<xs:pattern value="Register Close"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="TransactionRegisterOpen_type">
<xs:sequence>
<xs:element name="SequenceNumber" type="xs:unsignedShort"/>
<xs:element name="ControlTransaction">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="1" name="ReasonCode" type="RegisterOpen_type"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="TransactionRegisterClose_type">
<xs:sequence>
<xs:element name="SequenceNumber" type="xs:unsignedShort"/>
<xs:element name="ControlTransaction">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="1" name="ReasonCode" type="RegisterClose_type"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="RetailTransaction_type">
<xs:sequence>
<xs:element name="SequenceNumber" type="xs:unsignedShort"/>
<xs:element name="ControlTransaction">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="1" name="Total" type="xs:decimal"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:schema>
Has anyone run into this and/or have any suggestions? I'm pretty much stumped.
Perhaps with enumeration ?
<?xml version="1.0"?>
<xs:schema
xmlns:cp="urn:register"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
attributeFormDefault="unqualified"
elementFormDefault="qualified"
targetNamespace="urn:register">
<xs:element name="RegisterDay">
<xs:complexType>
<xs:sequence>
<xs:element
minOccurs="1"
maxOccurs="unbounded"
name="Transaction"
type="cp:TypeTransaction"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="TypeTransaction">
<xs:sequence>
<xs:element name="SequenceNumber" type="xs:unsignedShort"/>
<xs:choice>
<xs:element name="RetailTransaction"/>
<xs:element name="ControlTransaction">
<xs:complexType>
<xs:sequence>
<xs:element name="ReasonCode">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Register Open"/>
<xs:enumeration value="Register Close"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:schema>
What is the difference between an xs:group and an xs:sequence in XML Schema? When would you use one or the other?
xs:sequence - together with xs:choice and xs:all - is used to define the valid sequences of XML element in the target XML. E.g. the schema for this XML:
<mainElement>
<firstSubElement/>
<subElementA/>
<subElementB/>
</mainElement>
is something like:
<xs:element name='mainElement'>
<xs:complexType>
<xs:sequence>
<xs:element name="firstSubElement"/>
<xs:element name="subElementA"/>
<xs:element name="subElementB"/>
</xs:sequence>
</xs:complexType>
</xs:element>
xs:group is used to define a named group of XML element following certain rules that can then be referenced in different parts of the schema. For example if the XML is:
<root>
<mainElementA>
<firstSubElement/>
<subElementA/>
<subElementB/>
</mainElementA>
<mainElementB>
<otherSubElement/>
<subElementA/>
<subElementB/>
</mainElementB>
</root>
you can define a group for the common sub-elements:
<xs:group name="subElements">
<xs:sequence>
<xs:element name="subElementA"/>
<xs:element name="subElementB"/>
</xs:sequence>
</xs:group>
and then use it:
<xs:element name="mainElementA">
<xs:complexType>
<xs:sequence>
<xs:element name="firstSubElement"/>
<xs:group ref="subElements"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="mainElementB">
<xs:complexType>
<xs:sequence>
<xs:element name="otherSubElement"/>
<xs:group ref="subElements"/>
</xs:sequence>
</xs:complexType>
</xs:element>
I it possible to make a choice scenario, like (A or B or Both). If yes, how can this be done with the following elements?
<xs:element name="a" type="typeA" />
<xs:element name="b" type="typeB" />
Hope you can help.
Regards,
Nima
You can see XSD "one or both" choice construct leads to ambiguous content model
<xs:schema xmlns:xs="...">
<xs:element name="a" type="typeA" />
<xs:element name="b" type="typeB" />
<xs:element name="...">
<xs:complexType>
<xs:sequence>
<xs:choice>
<xs:sequence>
<xs:element ref="a"/>
<xs:element ref="b" minOccurs="0"/>
</xs:sequence>
<xs:element ref="b"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
I have a following schema:
<xs:element name="Invoice">
<xs:complexType>
<xs:sequence>
.....
<xs:element name="InvoiceLines" type="InvoiceLinesType">
</xs:element>
.....
</xs:complexType>
</xs:element>
<xs:complexType name="InvoiceLinesType">
<xs:sequence>
<xs:element maxOccurs="unbounded" name="InvoiceLine" type="InvoiceLineType">
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="InvoiceLineType">
<xs:sequence>
.....
</xs:sequence>
</xs:complexType>
The problem is, that it generate classes:
Invoice - which contain member of InvoiceLinesType
InvoiceLinesType - which contain a collection of InvoiceLineType
InvoiceLineType
So there is one unnecessary class (InvoiceLinesType) and i prefer the following
Invoice - which contain a collection of InvoiceLineType
InvoiceLineType
Does anyone know how to tell the compiler not to generate this package (InvoiceLinesType).
My current external binding file is there
<jxb:bindings schemaLocation="invoice.xsd" node="/xs:schema">
<jxb:globalBindings>
.....
<xjc:simple/>
.....
</jxb:globalBindings>
</jxb:bindings>
Thank You for response.
You would have to modify your schema - drop InvoiceLinesType and have InvoiceLineType as unbounded element in Invoice.
<xs:element name="Invoice">
<xs:complexType>
<xs:sequence>
.....
<xs:element maxOccurs="unbounded" name="InvoiceLine" type="InvoiceLineType">
</xs:element>
.....
</xs:complexType>
</xs:element>
<xs:complexType name="InvoiceLineType">
<xs:sequence>
.....
</xs:sequence>
</xs:complexType>
What's wrong with this xml schema? It doesn't parse correctly, and I can't realize a hierarchy between cluster(element)->host(element)->Load(element).
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="cluster">
<xs:complexType>
<xs:sequence>
<xs:element ref="host"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="host">
<xs:complexType>
<xs:element ref="Load"/>
<xs:attribute name="name" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="Load">
<xs:complexType>
<xs:attribute name="usedPhisicalMemory" type="xs:integer"/>
</xs:complexType>
</xs:element>
</xs:schema>
Thank you, Emilio
To allow something like this (I corrected the typo in "usedPhysicalMemory"):
<cluster>
<host name="foo">
<Load usedPhysicalMemory="500" />
</host>
<host name="bar">
<Load usedPhysicalMemory="500" />
</host>
</cluster>
This schema would do it:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="cluster">
<xs:complexType>
<xs:sequence>
<xs:element ref="host" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="host">
<xs:complexType>
<xs:sequence>
<xs:element ref="Load" />
</xs:sequence>
<xs:attribute name="name" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
<xs:element name="Load">
<xs:complexType>
<xs:attribute name="usedPhysicalMemory" type="xs:integer" />
</xs:complexType>
</xs:element>
</xs:schema>
From the MSDN on <xs:complexType> (because the spec makes my brain hurt):
If group, sequence, choice, or all is specified, the elements must
appear in the following order:
group | sequence | choice | all
attribute | attributeGroup
anyAttribute
Maybe someone else can point out the relevant section in the spec.
In the host element, the load element cannot be a child of complexType, you must have a sequence, etc. in between.