lxml confused by empty xmlns in XSD (XML schema) - xsd

I have an XSD file that includes an inner type and then a reference back to it from an outer type. At the top level, it also includes an empty xmlns attribute. Like this:
<?xml version="1.0"?>
<xs:schema id="foo" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="inner">
<xs:complexType>
<xs:sequence>
<xs:element name="a" type="xs:double" minOccurs="1" />
<xs:element name="b" type="xs:double" minOccurs="1" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="outer">
<xs:complexType>
<xs:sequence>
<xs:element ref="inner" minOccurs="0" maxOccurs="unbounded" />
<xs:element name="c" type="xs:double" minOccurs="1" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
However if I try to validate some XML against it with the lxml library:
from io import StringIO
from lxml import etree
schema = etree.XMLSchema(file=StringIO(xsd))
xml = "<outer> <inner><a>1</a><b>2</b></inner> <c>3</c> </outer>"
print(schema.validate(etree.parse(StringIO(xml))))
Then I get a strange error:
Traceback (most recent call last):
File "foo.py", line 57, in <module>
schema = etree.XMLSchema(file=StringIO(xsd))
File "src\lxml\xmlschema.pxi", line 88, in lxml.etree.XMLSchema.__init__
lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}element', attribute 'ref': References from this schema to components in the namespace '' are not allowed, since not indicated by an import statement., line 14
If I remove the xmlns="" part then the error goes away, so that seems to be the problematic part. I don't know too much about namespaces in XML/XSD, but from reading other StackOverflow answers it seems likely that this part is there due to the XSD being generated from C#.
Presumably the original author of the XSD (or the code that generates it) doesn't have this problem when validating XML against it in C#. And I've also managed to validate against it in Python using the xmlschema library - I didn't do anything special, it just seemed to work by default. So it seems to just be lxml that doesn't like it.
I don't want to modify the XSD files - as mentioned above, they come from another source, and in fact there are quite a lot of them, so it would be a maintenance burden if they were to change in future. Why is lxml having a problem with it? Is there a way to fix / avoid the error (without modifying the XSD)?

Related

Use of the schema-element node test in XSD 1.1's assert test

I am trying to design a XML schema where a certain element may alternatively hold either a single element belonging to a substitution group or a collection of certain elements which I want to be free-order (like in "all").
Due to the limitations on the "all" type of groups I cannot nest it into a "choice", so I tried a design which is similar to the following:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified" attributeFormDefault="unqualified" vc:minVersion="1.1">
<xs:element name="X" abstract="true"/>
<xs:element name="X1" substitutionGroup="X"/>
<xs:element name="X2" substitutionGroup="X"/>
<xs:element name="Y">
<xs:complexType>
<xs:all>
<xs:element ref="X" minOccurs="0"/>
<xs:element name="Z1" minOccurs="0" type="xs:string"/>
<xs:element name="Z2" minOccurs="0" type="xs:string"/>
<xs:element name="Z3" minOccurs="0" type="xs:string"/>
</xs:all>
<xs:assert test="not(schema-element(X)) or not(Z1 or Z2 or Z3)"/>
</xs:complexType>
</xs:element>
</xs:schema>
When the schema file is validated, I get the following error:
File C:\whereabouts\xsd-assertion-problem.xsd is not valid.
Assertion 'not(schema-element(X)) or not(Z)' is no valid XPath 2.0 expression.
Error location: xs:schema / xs:element / xs:complexType / xs:assert
Details
XPST0008: Element name not found in static context's in-scope element declarations
as-props-correct.2: Assertion 'not(schema-element(X)) or not(Z)' is no valid XPath 2.0 expression.
The question is: what is wrong here and how to fix it? When I read (https://www.w3.org/TR/xpath-31/#ERRXPST0008)[the description of XPath error condition], it explicitly excludes "an ElementName in an ElementTest", which should be the case here, so static analysis should not fail here. Or am I wrong?
Note that the substitution group for X is open for extension and finding all locations where references to X are made may be difficult, that's why I strongly prefer to use a schema-element-based test.
On the other hand, while writing Z1 or Z2 or Z3 is also cumbersome, these elements are local, so this solution is more or less acceptable. Of course, if there are better ideas, they are welcome!
Just in case, I rely on the Altova engine.

PyXB: generating class names in Unicode

Can somebody point me to the right direction as I'm unable to generate binding classes with PyXB when element names are non ASCII?
The minimal reproducible example:
<?xml version="1.0" encoding="utf8"?>
<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Address">
<xs:complexType>
<xs:sequence>
<xs:element name="Country" type="xs:string" />
<xs:element name="Street" type="xs:string" />
<xs:element name="Town" type="xs:string" />
<xs:element name="Дом" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
(look for the <xs:element name="Дом" type="xs:string" /> where I use cyrillic.
The encoding of the file is utf8.
However, when I try:
pyxbgen -u example.xsd -m example
I got the error:
Traceback (most recent call last):
File "/home/sergey/anaconda3/lib/python3.5/xml/sax/expatreader.py", line 210, in feed
self._parser.Parse(data, isFinal)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 9, column 26
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/sergey/anaconda3/bin/pyxbgen", line 52, in <module>
generator.resolveExternalSchema()
.......
which points to the cyrillic name of the element. What am I missing?
UTF8 is spelled "utf-8" in XML and in Python.
lilith[33]$ head -1 /tmp/cyr.xsd
<?xml version="1.0" encoding="utf-8"?>
lilith[34]$ pyxbgen -u /tmp/cyr.xsd -m cyr
WARNING:pyxb.binding.generate:Element use None.Дом renamed to emptyString
Python for AbsentNamespace0 requires 1 modules
That PyXB generates an element named emptystring instead of one named Дом is problem, though. PyXB was designed long before Python 3 and unicode support, and it goes to great effort to convert text to valid Python 2 identifiers.
Since you're using Python 3 it should be possible to bypass that conversion, but it's not quite trivial. Track issue 67, or if there's a Cyrillic transliteration you prefer the technique demonstrated here for Japanese might work.

Xerxes/SAX2 reports wrong element

One of the things I've written down for the airing-of-grievances for Festivus this year is how Xerces/SAX2 reports parsing errors.
Take this bit of XSD:
<xs:sequence>
<xs:element ref="element1" />
<xs:element ref="element2" />
<xs:element ref="element3" />
<xs:element ref="element4" minOccurs="0" />
<xs:element ref="element5" />
<xs:element ref="element6" minOccurs="1" />
<xs:element ref="element7" minOccurs="0" />
<xs:element ref="element8" minOccurs="0" />
<xs:choice minOccurs="0">
<xs:element ref="choiceElement1" />
<xs:element ref="choiceElement2" />
</xs:choice>
<xs:element ref="element9" minOccurs="0" />
</xs:sequence>
and sample XML
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<xmldocument xmlns="http://www.somewebsite.com/xsd/xmldocument" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.somewebsite.com/xsd/xmldocument xmldocument.xsd">
<transaction msgId="MESSAGE-ID">
<element1>KS0003</element1>
<element2>2016-05-09</element2>
<element3>10:20:50</element3>
<element5>99433</element5>
<element8>jesse</element8>
</transaction>
</xmldocument>
I get this error:
RAW SAX2 ERROR: Error at file "/tmp/QACXV0Z346", line=10, column=17,
XML element=element8, Element 'element8' is not valid for content
model:
'((element1,element2,element3,element4,element5,element6,element7,element8,(choiceElement1|choiceElement1)),element9)'
Seems to me the problem here isn't element8, it's element6, which is set to required but is the one actually missing in the XML.
I have some code that attempts to parse out this string and figure out what the real problem is, but the error string doesn't contain any information about optional elements, etc. I may not be setting things up correctly - maybe. I have a problem in general with SAXException - it's nearly useless - so what I need is more information from something that tells me what the real problem is.
We're using Xerces 2.6 or 2.8 because we're running on an IBM i and they don't provide updates to stuff like this unless you upgrade the OS.
Xerces error messages are actually quite good.
You might argue that in this particular case, it would be better to say something along the lines of
Encountered element8 but element6 was expected.
That's fine for this simple case, but realize that in the general case, there can be an arbitrarily complex expression covering what possibly could have been expected. Be prepared to introduce a whole lot of complexity to concisely explain what all would be allowed at a given point where parsing goes awry. Citing the first point of contradiction along with the violated parent content model requirement is not a bad diagnostic in general.

How can I define an XML schema element that allows either base64 content or an xop:Include element?

I have a XML schema that defines an element that may be either base64 text or an xop:Include element. Currently, this is defined as a base64Binary type:
<xs:element name="PackageBinary" type="xs:base64Binary" minOccurs="1" maxOccurs="1"/>
When I insert the xop:Include element instead, it looks like this:
<PackageBinary>
<xop:Include xmlns:xop="http://www.w3.org/2004/08/xop/include" href="http://google.com/data.bin" />
</PackageBinary>
But this gives an XML validation error (I'm using .NET validator):
The element 'mds:xml-schema:soap11:PackageBinary' cannot contain child
element 'http://www.w3.org/2004/08/xop/include:Include' because the
parent element's content model is text only.
This makes sense because it's not base64 content, but I thought this was common practice...? Is there any way to support this in the schema? (We have existing product that supports this syntax but we are adding validation now.)
The best I could come up with was to create a complex type that allowed any tags but was also tagged as "mixed" so it allowed text. This doesn't explicitly declare the content as base64, but it does let it pass validation.
<xs:complexType name="PackageBinaryInner" mixed="true">
<xs:sequence>
<xs:any minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
<xs:element name="PackageBinary" type="PackageBinaryInner" minOccurs="1" maxOccurs="1"/>
The solution I've found is like this:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="http://example.org"
elementFormDefault="qualified"
xmlns="http://example.org"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xop="http://www.w3.org/2004/08/xop/include">
<xs:import namespace="http://www.w3.org/2004/08/xop/include"
schemaLocation="http://www.w3.org/2004/08/xop/include"/>
<xs:complexType name="PackageBinary" mixed="true">
<xs:all>
<xs:element ref="xop:Include"/>
</xs:all>
</xs:complexType>
I saw this in an xml document that appeared to allow validation - basically the attribute xmlns:xop="..." did the trick:
<SomeElement xmlns:xop="http://www.w3.org/2004/08/xop/include/" id="465390" type="html">
<SomeElementSummaryURL>https://file.someurl.com/SomeImage.html</SomeElementSummaryURL>
<xop:Include href="cid:1111111#someurl.com"/>
</SomeElement >

XSD. Different between xsd:element and xs:element?

I reading articles about XSD on w3schools and here many examples. For example this:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
But after I tried put this .xsd file in xjc - I see error log, dome like this:
The prefix "xs" for element "xs:schema" is not bound...
But all work correct when I change xs on xsd prefix.
So, can somebody, clarify for me what is different between xs and xsd?
Maybe, one prefix - it is old version and other for new version...
xs and xsd are XML prefixes used with qualified names; each prefix must be associated with a namespace. The association is done with an attribute that looks like xmlns:xs="...". xs and xsd are most common for XML Schema documents.
Should you choose s or ns1, it shouldn't make any difference to any tool for your scenario.
The error is not caused by your XML Schema file. I suspect there might be something else in your setup, maybe a custom binding file. Please check that or post additional information.

Resources