Xerxes/SAX2 reports wrong element - xsd

One of the things I've written down for the airing-of-grievances for Festivus this year is how Xerces/SAX2 reports parsing errors.
Take this bit of XSD:
<xs:sequence>
<xs:element ref="element1" />
<xs:element ref="element2" />
<xs:element ref="element3" />
<xs:element ref="element4" minOccurs="0" />
<xs:element ref="element5" />
<xs:element ref="element6" minOccurs="1" />
<xs:element ref="element7" minOccurs="0" />
<xs:element ref="element8" minOccurs="0" />
<xs:choice minOccurs="0">
<xs:element ref="choiceElement1" />
<xs:element ref="choiceElement2" />
</xs:choice>
<xs:element ref="element9" minOccurs="0" />
</xs:sequence>
and sample XML
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<xmldocument xmlns="http://www.somewebsite.com/xsd/xmldocument" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.somewebsite.com/xsd/xmldocument xmldocument.xsd">
<transaction msgId="MESSAGE-ID">
<element1>KS0003</element1>
<element2>2016-05-09</element2>
<element3>10:20:50</element3>
<element5>99433</element5>
<element8>jesse</element8>
</transaction>
</xmldocument>
I get this error:
RAW SAX2 ERROR: Error at file "/tmp/QACXV0Z346", line=10, column=17,
XML element=element8, Element 'element8' is not valid for content
model:
'((element1,element2,element3,element4,element5,element6,element7,element8,(choiceElement1|choiceElement1)),element9)'
Seems to me the problem here isn't element8, it's element6, which is set to required but is the one actually missing in the XML.
I have some code that attempts to parse out this string and figure out what the real problem is, but the error string doesn't contain any information about optional elements, etc. I may not be setting things up correctly - maybe. I have a problem in general with SAXException - it's nearly useless - so what I need is more information from something that tells me what the real problem is.
We're using Xerces 2.6 or 2.8 because we're running on an IBM i and they don't provide updates to stuff like this unless you upgrade the OS.

Xerces error messages are actually quite good.
You might argue that in this particular case, it would be better to say something along the lines of
Encountered element8 but element6 was expected.
That's fine for this simple case, but realize that in the general case, there can be an arbitrarily complex expression covering what possibly could have been expected. Be prepared to introduce a whole lot of complexity to concisely explain what all would be allowed at a given point where parsing goes awry. Citing the first point of contradiction along with the violated parent content model requirement is not a bad diagnostic in general.

Related

Use of the schema-element node test in XSD 1.1's assert test

I am trying to design a XML schema where a certain element may alternatively hold either a single element belonging to a substitution group or a collection of certain elements which I want to be free-order (like in "all").
Due to the limitations on the "all" type of groups I cannot nest it into a "choice", so I tried a design which is similar to the following:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified" attributeFormDefault="unqualified" vc:minVersion="1.1">
<xs:element name="X" abstract="true"/>
<xs:element name="X1" substitutionGroup="X"/>
<xs:element name="X2" substitutionGroup="X"/>
<xs:element name="Y">
<xs:complexType>
<xs:all>
<xs:element ref="X" minOccurs="0"/>
<xs:element name="Z1" minOccurs="0" type="xs:string"/>
<xs:element name="Z2" minOccurs="0" type="xs:string"/>
<xs:element name="Z3" minOccurs="0" type="xs:string"/>
</xs:all>
<xs:assert test="not(schema-element(X)) or not(Z1 or Z2 or Z3)"/>
</xs:complexType>
</xs:element>
</xs:schema>
When the schema file is validated, I get the following error:
File C:\whereabouts\xsd-assertion-problem.xsd is not valid.
Assertion 'not(schema-element(X)) or not(Z)' is no valid XPath 2.0 expression.
Error location: xs:schema / xs:element / xs:complexType / xs:assert
Details
XPST0008: Element name not found in static context's in-scope element declarations
as-props-correct.2: Assertion 'not(schema-element(X)) or not(Z)' is no valid XPath 2.0 expression.
The question is: what is wrong here and how to fix it? When I read (https://www.w3.org/TR/xpath-31/#ERRXPST0008)[the description of XPath error condition], it explicitly excludes "an ElementName in an ElementTest", which should be the case here, so static analysis should not fail here. Or am I wrong?
Note that the substitution group for X is open for extension and finding all locations where references to X are made may be difficult, that's why I strongly prefer to use a schema-element-based test.
On the other hand, while writing Z1 or Z2 or Z3 is also cumbersome, these elements are local, so this solution is more or less acceptable. Of course, if there are better ideas, they are welcome!
Just in case, I rely on the Altova engine.

DFDL Schema for parsing delimited text message

Need small help for DFDL. I need to parse below message as something like XML/tree structure. Elements are not fixed and dynamic. Sometime some other elements will appear.
XML/Tree output expected as something below
<root>
<CLIENT_ID>DESKTOPCLIENT</CLIENT_ID>
<LOCALE>en-US</LOCALE>
<ENCODE/>
</root>
Something like this is a possible solution, tested in Daffodil:
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">
<xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format
ref="GeneralFormat"
lengthKind="delimited"
/>
</xs:appinfo>
</xs:annotation>
<xs:element name="root" dfdl:initiator="%ESC;" dfdl:terminator="%SUB;">
<xs:complexType>
<xs:sequence dfdl:separator="%CAN;" dfdl:separatorPosition="prefix" dfdl:sequenceKind="unordered">
<xs:element name="CLIENT_ID" type="xs:string" dfdl:initiator="CLIENT_ID%NAK;" />
<xs:element name="LOCALED" type="xs:string" dfdl:initiator="LOCALE%NAK;" />
<xs:element name="ENCODE" type="xs:string" dfdl:initiator="ENCODE%NAK;" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Note that this assumes fixed names for the individual elements, and that they all exist, though the order does not matter. If you know the fixed names, but they may or may not exist, you can add minOccurs="0" to the elements in the unorderd sequence.
However, DFDL does not allow for dynami element names, so if you don't know the names, you need a slightly different schema. Instead, you need to describe the data as an unbouned number of name/value pairs, where the name and value are separated by %NAK;, for example:
<xs:element name="root" dfdl:initiator="%ESC;" dfdl:terminator="%SUB;">
<xs:complexType>
<xs:sequence dfdl:separator="%CAN;" dfdl:separatorPosition="prefix">
<xs:element name="element" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence dfdl:separator="%NAK;" dfdl:separatorPosition="infix">
<xs:element name="name" type="xs:string" />
<xs:element name="value" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
This results in an infoset that looks something like this:
<root>
<element>
<name>CLIENT_ID</name>
<value>DESKTOPCLIENT</value>
</element>
<element>
<name>LOCALE</name>
<value>en-US</value>
</element>
<element>
<name>ENCODE</name>
<value></value>
</element>
</root>
If you need the XML tags to match the name fields like in your question, you would then need to transform the infoset. XSLT can do this kind of transformation without much trouble.
Edit: There seems to be an issue where IBM DFDL does not like the above solution. I'm not sure why, but it works with Apache Daffodil. Something about value being the empty string causes an issue. After some trial and error, I've found that IBM DFDL (and Apache Daffodil too) are okay with it if you specify that empty value elements should be treated as nil. So changing the value element to this works:
<xs:element name="value" type="xs:string" nillable="true"
dfdl:nilKind="literalValue" dfdl:nilValue="%ES;"
dfdl:useNilForDefault="no"/>
In that case, the infoset ends up with something like this:
<element>
<name>ENCODE</name>
<value xsi:nil="true"></value>
</element>
Edit2: The nillable properties are required because otherwise IBM DFDL treats an empty string value as absent rather than having an empty value. Being absent results in the error. Newer versions of the DFDL spec add a new property, emptyElementParsePolicy, which lets you control whether or not empty strings are treated as absent or are just treated as an empty string. Daffodil implements this property as an extensions, but defaults to the treat as empty behavior. IBM DFDL has the treat as absent behavior. Daffodil has a similar behavior to IBM DFDL when setting this property to treat as absent.

XSD with no order and selective restriction

I need to validate an XML that contains element in random order and some of them must exist and some of them only once. BTW some elements can be nested recursively.
For example there is a room that should contain one door and any number of boxes and elements. Boxes Can contain other boxes or/and elements.
Example XML:
<Room>
<Element />
<Box>
<Box>
<Element />
<Box></Box>
<Element />
</Box>
<Element />
</Box>
<Door />
<Element />
</Room>
This example is very simple, but in my case there are a lot of elements that can be in <Room>. Recursion is not a problem. The problem is to make <Door> to be required and in any order with siblings that are not required.
UPD: the question is about XSD 1.0 because I use .NET and there are no free lib for XSD 1.1
From what i'm reading i think you might need to use schema (XSD) indicators.
Check following link for more information: Schema indicators
Random Order
From your question:
I need to validate an XML that contains element in random order
Possible answer:
Using the All indicator (see link) you can specify that the element are in random order.
All Indicator
The indicator specifies that the child elements can appear in any order, and that each child element must occur only once:
Occurances
From your question:
some of them must exist and some of then once
Possible answer:
If i'm understanding it correctly you want to specify the amount of times an element exist or can be used. This is called occurance and again can be found back on the following link. You'll have to determine minOccurs and maxOccurs following your requirements.
Occurrence Indicators
Occurrence indicators are used to define how often an element can occur.
The "maxOccurs" indicator specifies the maximum number of times an element can occur:
The "minOccurs" indicator specifies the minimum number of times an element can occur:
Everything including examples can be found back on the XSD/Schema indicators.
How your XSD (xml schema) will probably look like:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Room" type="Room_T"/>
<xs:complexType name="Room_T">
<xs:all>
<xs:element name="Element" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="Box" type="Box_T" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="Door" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
</xs:all>
</xs:complexType>
<xs:complexType name="Box_T">
<xs:all>
<xs:element name="Element" type="xs:string" minOccurs="1" maxOccurs="unbounded"/>
<xs:element name="Box" type="Box_T" minOccurs="0" maxOccurs="1"/>
</xs:all>
</xs:complexType>
</xs:schema>
I didn't check if the code above is valid but i think it could definitely get you started!

How to put "required" a field in XSD creation?

I'm writing down some XSD file for webservice communication between an application and sharepoint.. I'm trying to make my parameters "REQUIRED" but even if i put minOccurs to 1, they could be not specified..
How can i resolve this problem? Here's one of mine XSD:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="RemoveGroup"
targetNamespace="http://tempuri.org/RemoveGroup.xsd"
elementFormDefault="qualified"
xmlns="http://tempuri.org/RemoveGroup.xsd"
xmlns:mstns="http://tempuri.org/RemoveGroup.xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
>
<xs:element name="RemoveGroup">
<xs:complexType>
<xs:sequence>
<xs:element name="tt_group_id" type="xs:long" />
<xs:element name="tt_network_id" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
I hope there's a way not to write down houndred of "if (input.Parameter != null)" ...
Using minOccurs="1" at either the <element/> or <sequence/> level is the correct thing to do. What specific error are you getting?
UPDATE
Actually within a <sequence/> parsers should expect exactly one instance of an element
UPDATE
Your parser may be emitting errors as events which you need to handle in order to capture the errors - many common parsers have this behaviour.
Something which could cause an error is a null value in the long simple type - this type does not allow blanks. If you want to indicate that nulls are allowed you should use nil=true from namespace http://www.w3.org/2001/XMLSchema-instance.

Is mixed inherited when a complexType is extended?

I have the following in a schema:
<xs:element name="td">
<xs:complexType>
<xs:complexContent>
<xs:extension base="cell.type"/>
</xs:complexContent>
</xs:complexType>
</xs:element>
<xs:complexType name="cell.type" mixed="true">
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:element ref="p"/>
</xs:sequence>
</xs:complexType>
Some parsers allow PCDATA directly in the element, while others don't. There's something in the XSD recommendation (3.4.2) that says when a complex type has complex content, and neither has a mixed attribute, the effective mixed is false. That means the only way mixed content could be in effect is if the extension of cell.type causes the mixed="true" to be inherited.
Could someone more familiar with schemas comment on the correct interpretation?
(BTW: if I had control of the schema I would move the mixed="true" to the element definition, but that's not my call.)
Anyone reading my question might want to read this thread also (by Damien). It seems my answer isn't entirely right: parsers/validators don't handle mixed attribute declarations on base/derived elements the same way.
Concerning extended complex types, sub-section 1.4.3.2.2.1 of section 3.4.6 in part 1 of W3C's XML Schema specification says that
Both [derived and base] {content type}s must be mixed or both must be element-only.
So yes, it is inherited (or more like you cannot overwrite it—same thing in the end).
Basically, what you've described is the desired (and as far as I'm concerned) the most logical behavior.
I've created a simple schema to run a little test with Eclipse's XML tools.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="c">
<xs:complexType>
<xs:complexContent mixed="false">
<xs:extension base="a"/>
</xs:complexContent>
</xs:complexType>
</xs:element>
<xs:complexType name="a" mixed="true">
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:element name="b"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
The above schema is valid, in the sense that not Eclipse's nor W3C's "official" XML Schema validator notices any issues with it.
The following XML passes validation against the aforementioned schema.
<?xml version="1.0" encoding="UTF-8"?>
<c xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="test.xsd">
x
<b/>
y
</c>
So basically you cannot overwrite the mixedness of a complex base type. To support this statement further, try and swap the base and dervied types' mixedness. In that case the XML fails to validate, because the derived type won't be mixed as it (yet again) cannot overwrite the base's mixedness.
You've also said that
Some parsers allow PCDATA directly in the element, while others don't
It couldn't hurt to clarify which parsers are you talking about. A good parser shouldn't fail when it encounters mixed content. A validating parser, given the proper schema, will fail if it encounters mixed content when the schema does not allow it.

Resources