Is it possible to define a XSD element with an arbitrary name but with specific attributes - xsd

I would like to validate a custom XML document against a schema.
This document would include a structure with any number of elements, each having a specific attribute. Something like this:
<Root xmlns="http://tns">
<Records>
<FirstRecord attr='whatever'>content for first record</FirstRecord>
<SecondRecord attr='whatever'>content for first record</SecondRecord>
...
<LastRecord attr='whatever'>content for first record</LastRecord>
</Records>
</Root>
The author of the XML document can include any number of records, each with an arbitrary name of his or her choosing. How is this possible to validate this against an XML Schema ?
I have tried to specify the appropriate structure type in a schema, but I do not know how to reference it in the appropriate location:
<xs:schema xmlns="http://tns" targetNamespace="http://tns" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="RecordType"> <!-- This is my record type -->
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="attr" type="xs:string" use="required" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<xs:element name="Root">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="1" maxOccurs="1" name="Records">
<!-- This is where records should go -->
<xs:complexType />
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

What you describe is possible in XSD 1.1, and something very similar is possible in XSD 1.0, which is not to say it's an advisable design.
In XML vocabularies, the element type normally conveys relevant information about the type of information, and it is the name of the element type that is used to drive validation in most XML schema languages; the design you describe is (some would say) a little bit like asking if I can define an object class in Java, or a struct in C, which obeys the constraints that the members can have arbitrary names, as long as one of them is an integer with the value 42. That, or something like it, may well be possible, but most experienced designers will feel strongly that this is almost certainly not the right way to go about solving any normal problem.
On the other hand, doing unusual and awkward things with a system can sometimes help in learning how to use the system effectively. (You never know a system well, said a friend of mine once, until you have thoroughly abused it.) So my answer has two parts: how to come as close as possible to the design you specify in XSD, and alternatives you might consider instead.
The simplest way to specify the language you seem to want in XSD 1.1 is to define an assertion on the Records element which says (1) that every child of Records has an 'attr' attribute and (2) that no child of Records has any children. You'll have something like this:
...
<xs:element minOccurs="1" maxOccurs="1" name="Records">
<xs:complexType>
<xs:sequence>
<xs:any/>
</xs:sequence>
<xs:assert
test="every $child in * satisfies $child/#attr"/>
<xs:assert
test="not(*/*)"/>
</xs:complexType>
</xs:element>
...
As you can see, this is very similar to what InfantPro'Aravind' has described; it avoids the problems identified by InfantPro'Aravind' by using assertion, not type assignment, to impose the constraints you impose.
In XSD 1.0, assertion is not available, and the only way I can think of to come close to the design you describe is define an abstract element, which I'll call Record, as the child of Records, and to require that the elements which actually occur as children of Records be declared as being substitutable for this abstract type (which in turn requires that their types be derived from type RecordType). Your schema might say something like this:
<xs:element name="Root">
<xs:complexType>
<xs:sequence>
<xs:element name="Records">
<xs:complexType>
<xs:sequence>
<xs:element name="Record"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="RecordType">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="attr"
type="xs:string"
use="required" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<xs:element name="Record"
type="RecordType"
abstract="true"/>
Elsewhere in the schema (possibly in a separate schema document) you will need to declare FirstRecord, etc., and specify that they are substitutable for Record, thus:
<xs:element name="FirstRecord" substitutionGroup="Record"/>
<xs:element name="SecondRecord" substitutionGroup="Record"/>
<xs:element name="ThirdRecord" substitutionGroup="Record"/>
...
At some level, this matches your description, though I suspect you did not want to have to declare FirstRecord, SecondRecord, etc.
Having described ways in which XSD can do what you describe, I should also say that I wouldn't recommend either of these approaches. Instead, I'd design the XML vocabulary differently to work more naturally with XSD.
In the design as you specify it, every record appears to have the same type, but in addition to the content of the element they are allowed to convey a certain additional quantity of information by having a different name (FirstRecord, SecondRecord, etc.). This additional information could just as easily be conveyed in an attribute, which would allow you to specify Record as a concrete element, rather than an abstract element, giving it an extra "alternate-name" attribute. Your sample data would then take a form like this:
<Root xmlns="http://tns">
<Records>
<Record
alternate-name="FirstRecord"
attr='whatever'>content for first record</Record>
<Record
alternate-name="SecondRecord"
attr='whatever'>content for first record</Record>
...
<Record
alternate-name="LastRecord"
attr='whatever'>content for first record</Record>
</Records>
</Root>
This will be more or less acceptable depending on whether you or your data providers or tools in your tool chain attach some mystic or other significance to having the string "FirstRecord" be an element type name instead of an attribute value.
Alternatively, one could say that the point of the design is to allow Records to contain an arbitrary sequence of elements of arbitrary structure (on this account, the restriction to xs:string is just an artifact of your example and is not really desired in reality) as long as we have, for each record, the information recorded in the 'attr' attribute. Easy enough to specify this: define 'Record' as a concrete element with an 'attr' attribute, accepting one child which can be any XML element:
<xs:element name="Record">
<xs:complexType>
<xs:sequence>
<xs:any processContents="lax"/>
</xs:sequence>
<xs:attribute name="attr"
use="required"
type="xs:string"/>
</xs:complexType>
</xs:element>
The value of the 'processContents' attribute can be changed to 'strict' or 'skip' or kept at 'lax', depending on whether you want FirstRecord, SecondRecord, etc. to be validated (and declared) or not.

I guess this is not possible with XSD alone!
When you say
any number of records, each with an arbitrary name of his or her choosing.
That forces us to use <xs:any/> element but! Having an element declared as any doesn't allow you to validate attributes under it..
So .. the answer is NO!

Related

In an XSD file, Is it legal to use two Order Indicators for a given element?

I am creating an XSD for the following XML structure:
<BaseNode>
<ParentNode1>
<childnode/>
</ParentNode1>
<ParentNode2>
<childnode/>
</ParentNode2>
<ParentNodeA>
<childnode/>
</ParentNodeA>
<ParentNodeB>
<childnode/>
</ParentNodeB>
</BaseNode>
Where: ParentNodes 1 and 2 must appear and in order, and A and B are optional (and will only appear once each, if present), but must appear after 1 and 2 if present.
What I 'think' will work is the following, but is it valid? (specifically, the presence of both, sequence and all Order Indicators)
<xs:element name="BaseNode">
<xs:complexType>
<xs:sequence>
<xs:element name="ParentNode1">
....
</xs:element>
<xs:element name="ParentNode2">
....
</xs:element>
</xs:sequence>
<xs:all>
<xs:element name="ParentNodeA">
....
</xs:element>
<xs:element name="ParentNodeB">
....
</xs:element>
</xs:all>
</xs:comlexType>
</xs:element>
I couldn't find any reference (in w3schools.com or elsewhere) to compound use of order indicators, and don't have a validator readily available.
Thank you in advance.
I found the answer at http://www.w3.org/TR/xmlschema-0/#groups
XML Schema stipulates that an all group must
appear as the sole child at the top of a content model.
example provided at the link.

Is mixed inherited when a complexType is extended?

I have the following in a schema:
<xs:element name="td">
<xs:complexType>
<xs:complexContent>
<xs:extension base="cell.type"/>
</xs:complexContent>
</xs:complexType>
</xs:element>
<xs:complexType name="cell.type" mixed="true">
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:element ref="p"/>
</xs:sequence>
</xs:complexType>
Some parsers allow PCDATA directly in the element, while others don't. There's something in the XSD recommendation (3.4.2) that says when a complex type has complex content, and neither has a mixed attribute, the effective mixed is false. That means the only way mixed content could be in effect is if the extension of cell.type causes the mixed="true" to be inherited.
Could someone more familiar with schemas comment on the correct interpretation?
(BTW: if I had control of the schema I would move the mixed="true" to the element definition, but that's not my call.)
Anyone reading my question might want to read this thread also (by Damien). It seems my answer isn't entirely right: parsers/validators don't handle mixed attribute declarations on base/derived elements the same way.
Concerning extended complex types, sub-section 1.4.3.2.2.1 of section 3.4.6 in part 1 of W3C's XML Schema specification says that
Both [derived and base] {content type}s must be mixed or both must be element-only.
So yes, it is inherited (or more like you cannot overwrite it—same thing in the end).
Basically, what you've described is the desired (and as far as I'm concerned) the most logical behavior.
I've created a simple schema to run a little test with Eclipse's XML tools.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="c">
<xs:complexType>
<xs:complexContent mixed="false">
<xs:extension base="a"/>
</xs:complexContent>
</xs:complexType>
</xs:element>
<xs:complexType name="a" mixed="true">
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:element name="b"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
The above schema is valid, in the sense that not Eclipse's nor W3C's "official" XML Schema validator notices any issues with it.
The following XML passes validation against the aforementioned schema.
<?xml version="1.0" encoding="UTF-8"?>
<c xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="test.xsd">
x
<b/>
y
</c>
So basically you cannot overwrite the mixedness of a complex base type. To support this statement further, try and swap the base and dervied types' mixedness. In that case the XML fails to validate, because the derived type won't be mixed as it (yet again) cannot overwrite the base's mixedness.
You've also said that
Some parsers allow PCDATA directly in the element, while others don't
It couldn't hurt to clarify which parsers are you talking about. A good parser shouldn't fail when it encounters mixed content. A validating parser, given the proper schema, will fail if it encounters mixed content when the schema does not allow it.

xml scheme attribute or complextype conditional

I would like to allow for an element to either contain an attribute OR define a more complex type.
Something like
<myElement someAttr="..."/>
or
<myElement>
<...>
</myElement>
That is, if someAttr exists then I do not want to allow sub elements and if it doesn't then I want to.
The reason for this is I want to have an "include" feature where I include a file which is essentially inserted into the element. But I don't want both. You can either include additional external xml code into the element or add your own BUT not both. (or also to have it inserted from a separate part of the xml)
This is mainly for simplifying a complex xml so that the structure is easily understood.
I doubt you will be able to express something like that in XML schema at this point.
You can make an attribute optional, e.g. it can be present or not. But you cannot express something like if the attribute is not present, then include other complex content with the current means.
You'll have to either check this programmatically yourself, or maybe investigate if other XML description languages like RelaxNG or Schematron might be able to help.
Perhaps with a static choice and you change the myElement name ?
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="root">
<xs:complexType>
<xs:choice>
<xs:element name="myElementWithAttrs">
<xs:complexType>
<xs:attribute name="someAttrs" type="xs:string"/>
</xs:complexType>
</xs:element>
<xs:element name="myElementWithoutAttrs">
<xs:complexType>
<xs:sequence>
<xs:any processContents="skip"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>

XSD Formatting <element><complexType> vs <complexType/><element/>

This XSD portion was obtained from: http://www.iana.org/assignments/xml-registry/schema/netconf.xsd
<xs:complexType name="rpcType">
<xs:sequence>
<xs:element ref="rpcOperation"/>
</xs:sequence>
<xs:attribute name="message-id" type="messageIdType" use="required"/>
<xs:anyAttribute processContents="lax"/>
</xs:complexType>
<xs:element name="rpc" type="rpcType"/>
And is the core to function calls in NETCONF being the node of an XML document. I am curious as to why it is not something like:
<xs:element name="rpcType">
<xs:complexType>
<xs:sequence>
<xs:element ref="rpcOperation"/>
</xs:sequence>
<xs:attribute name="message-id" type="messageIdType" use="required"/>
<xs:anyAttribute processContents="lax"/>
</xs:complexType>
</xs:element>
The reasoning is that in #1 when trying to marshall a bean (in jaxb2) I get the exception:
[com.sun.istack.SAXException2: unable to marshal type "netconf.RpcType" as an element because it is missing an #XmlRootElement annotation]
I have been reading this article over and over again, and really cant get a hold of the difference, and why it would be #1 vs #2...
It's not obvious, I'll grant you. It comes down to the type vs element decision.
When you have something like
<xs:element name="rpcType">
<xs:complexType>
This is essentially an "anonymous type", and is a type which can never occur anywhere other than inside the element rpcType. Because of this certainty, XJC knows that that type will always have the name rpcType, and so generates an #XmlRootElement annotation for it, with the rpcType name.
On the other hand, when you have
<xs:complexType name="rpcType">
then this defines a re-usable type which could potentially be referred to by several different elements. The fact that in your schema it is only referred to by one element is irrelevant. Because of this uncertainty, XJC hedges its bets and does not generate an #XmlRootElement.
The JAXB Reference Implementation has a proprietary XJC flag called "simple binding mode" which, among other things, assumes that the schema you're compiling will never be extended or combined with another. This allows it to make certain assumptions, so if it sees a named complexType only being used by one element, then it will often generate #XmlRootElement for it.
The reality is rather more subtle and complex than that, but in 90% of cases, this is a sufficient explanation.
Quite an involved question. There are many reasons to design schemas using types rather than elements (this approach is called the "venetian blind" approach versus "salami slice" for using global elements). One of the reasons is that types can be sub-typed, and another that it may be useful to only have elements global that can be root elements.
See this article for some more details on the schema side.
Now, as for the JAXB question in particular. The problem is that you created a class corresponding to a type and tried to serialise it. That means JAXB knows its content model, but not what the element name should be. You need to attach your RpcType to an element (JAXBElement), for example:
marshaller.marshal(new ObjectFactory().createRpc(myRpcType));
The ObjectFactory was placed into the package created by JAXB for you.
Advantages of Named Types
The advantage of a schema using global/named types is that child/sub types can be created that extend the parent type.
<xs:complexType name="rpcType">
<xs:sequence>
<xs:element ref="rpcOperation"/>
</xs:sequence>
<xs:attribute name="message-id" type="messageIdType" use="required"/>
<xs:anyAttribute processContents="lax"/>
</xs:complexType>
<xs:element name="rpc" type="rpcType"/>
The above fragment would allow the following child type to be created:
<xs:complexType name="myRPCType">
<xs:complexContent>
<xs:extension base="rpcType">
<xs:sequence>
<xs:element name="childProperty" type="xs:string"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
Impact on JAXB
Another aspect of named types is that they may be used by multiple elements:
<xs:element name="FOO" type="rpcType"/>
<xs:element name="BAR" type="rpcType"/>
This means that the schema to Java compiler cannot simply just pick one of the possible elements to be the #XmlRootElement for the class corresponding to "rpcType".

Schema Issue: Can define element type OR add element attribute, but not both. I want both!

I've inherited the task of creating a schema for some XML which already exists - and IMHO is not the best that could have been done. The section giving me problems is the element at the end of the 'scan-result' element.
The best I'm hoping for with regard to the data in the 'spectrum' element is to treat it as type="xs:string". I'll programatically divide up the numeric pairs that constitute the data in the string later. (Even though this step would not be needed had the data been properly structured in the first place.)
Here's a similar piece of XML data to what I have to work with...
<scan-result>
<spectrum-index>0</spectrum-index>
<scan-index>2</scan-index>
<time-stamp>5609</time-stamp>
<tic>55510</tic>
<start-mass>22.0</start-mass>
<stop-mass>71.0</stop-mass>
<spectrum count="5">30,11352;31,360;32,16634;45,1161;46,26003</spectrum>
</scan-result>
The problem is, I can't seem to get a working definition for the 'spectrum' element that has the 'count' attribute and allows me to define the 'spectrum' element type as "xs:string".
What I would like is something like the following:
<xs:complexType name="ctypScanResult">
<xs:sequence>
<xs:element name="spectrum-index" type="xs:integer"/>
<xs:element name="scan-index" type="xs:integer"/>
<xs:element name="time-stamp" type="xs:integer"/>
<xs:element name="tic" type="xs:integer"/>
<xs:element name="start-mass" type="xs:float"/>
<xs:element name="stop-mass" type="xs:float"/>
<xs:element name="spectrum" type="xs:string">
<xs:complexType>
<xs:attribute name="count" type="xs:integer"/>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="count" type="xs:integer"/>
</xs:complexType>
The problem is that I can define the type of the 'spectrum' element as "xs:string" XOR I can define the anonymous 'xs:complexType' in the 'spectrum' element, which allows me to insert the 'count' attribute. But I need to be able to express both.
Given that I'm kind of stuck with the XML as it was handed to me, is there a schema definition that will allow me to describe this data?
Sorry this is long, but thanks to any and all who respond,
AlarmTripper
Followup: I know why the error occurs...
Quoted from W3C:
3.3.3 Constraints on XML Representations of Element Declarations
Schema Representation Constraint: Element Declaration Representation OK
In addition to the conditions imposed on element information items by the schema for schemas: all of the following must be true:
1 default and fixed must not both be present.
2 If the item's parent is not , then all of the following must be true:
2.1 One of ref or name must be present, but not both.
2.2 If ref is present, then all of , , , , , nillable, default, fixed, form, block and type must be absent, i.e. only minOccurs, maxOccurs, id are allowed in addition to ref, along with .
3 type and either or are mutually exclusive.
4 The corresponding particle and/or element declarations must satisfy the conditions set out in Constraints on Element Declaration Schema Components (§3.3.6) and Constraints on Particle Schema Components (§3.9.6).
But I'm still in the same fix I was before... How can I actually accomplish something that resembles my goal?
Thanks,
AlarmTripper
Let a tool do it for you! Try xsd.exe.
Or, if you must define by hand, at least check your hand-written-definition with an automatically generated one.
Here's what XSD.exe gave me for your input. I trimmed out some MS-NS cruft.
<xs:element name="spectrum">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="count" type="xs:string" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
You need to set the attribute mixed="true" on complexType:
<xs:element name="spectrum">
<xs:complexType mixed="true">
<xs:attribute name="count" type="xs:integer" />
</xs:complexType>
</xs:element>
EDIT: Okay, just read your comment, sorry. I believe the following should work instead:
<xs:element name="spectrum">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="count" type="xs:integer" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="spectrum" type="xs:string">
<xs:complexType>
<!-- ADD THIS NEXT LINE -->
<xs:complexContent mixed="true"/>
<xs:attribute name="count" type="xs:integer"/>
</xs:complexType>
</xs:element>

Resources