Difference of mixed="true" and xs:extension in XML Schema - xsd

What is the practical diference between these two:
<xs:element name="A">
<xs:complexType mixed="true">
<xs:attribute name="att" type="xs:boolean"/>
</xs:complexType>
</xs:element>
<xs:element name="B">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="att" type="xs:boolean"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>

The two are different. Your first example uses mixed="true" which denotes mixed content, i.e. character data mixed in with child elements. Whereas your second example restricts the element content to the xs:string type. Both indicate the presence of an attribute.
With your example, both are practically the same. However, if you do not plan on having mixed content, i.e. you do not plan to add child elements, the second version is much clearer.

Related

defining different sets of child nodes by attribute value

I'm trying to define a schema for some xml-based database exchange like this:
<table name="foo">
<row>
<fooid>15</fooid>
<fooname>some entry</fooname>
</row>
<row>
<fooid>28</fooid>
<fooname>something else</fooname>
</row>
</table>
<table name="bar">
<row>
<barid>19</barid>
<barcounter>93</barcounter>
</row>
</table>
so I have several of these tables and within these tables there should be only the fields that exist in these tables. For example barid should not appear in table foo.
Is there any way to define this?
Yes, there are two ways. One is simple (and relies on some human intuition and documentation), and the other is more expressive (but inevitably also a bit more complicated.)
The simple way is to replace the names 'table' and 'row' with names that indicate what table we are talking about:
<table-foo>
<row-foo>
<fooid>28</fooid>
<fooname>something</fooname>
</row-foo>
...
</table-foo>
<table-bar>
<row-bar>
<barid>19</barid>
<barcounter>93</barcounter>
</row-bar>
...
</table-bar>
XSD validation (like validation using DTDs and Relax NG) is based principally on the element names used. If you want two different kinds of row to contain different things, give them two different names. So foo-table and its descendants can be declared thus:
<xs:element name="table-foo" substitutionGroup="tns:table">
<xs:complexType>
<xs:sequence>
<xs:element ref="tns:row-foo"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="row-foo" substitutionGroup="tns:row">
<xs:complexType>
<xs:sequence>
<xs:element ref="tns:fooid"/>
<xs:element ref="tns:fooname"/>
</xs:sequence>
</xs:complexType>
And similarly for bar-table and bar-row.
Sometimes, however, we absolutely must, or really want to, capture the fact that both 'row-foo' and 'row-bar' have something crucial in common. They are both 'rows' in some abstract ontology, and that may matter to us. In such cases, you can use abstract elements to capture the regularity.
For example, here is a simple abstraction for tables, rows, and cells:
<xs:element name="table"
abstract="true"
type="tns:table"/>
<xs:element name="row"
abstract="true"
type="tns:row"/>
<xs:element name="cell"
abstract="true"
type="xs:anySimpleType"/>
The types for table and row are straightforward:
<xs:complexType name="table">
<xs:sequence>
<xs:element ref="tns:row" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="row">
<xs:sequence>
<xs:element ref="tns:cell" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
Now, the declarations for table-foo etc. become slightly more complicated, because for each declaration we have to establish a relation to the abstraction we have just defined. Element foo-table is an instantiation of the table abstraction, and its type is a restriction of the abstract table type:
<xs:element name="table-foo"
substitutionGroup="tns:table">
<xs:complexType>
<xs:complexContent>
<xs:restriction base="tns:table">
<xs:sequence>
<xs:element ref="tns:row-foo"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:element>
Element foo-row is similar: we specify that it's a "row" by using the substitutionGroup attribute, and we derive its complex type by restriction from the abstract row type:
<xs:element name="row-foo" substitutionGroup="tns:row">
<xs:complexType>
<xs:complexContent>
<xs:restriction base="tns:row">
<xs:sequence>
<xs:element ref="tns:fooid"/>
<xs:element ref="tns:fooname"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:element>
Note that we don't allow arbitrary cells to appear here, just the two cell types we want for rows from table foo. And to close off the pattern, we declare that the elements fooid and fooname are cells, using (again) substitutionGroup.
<xs:element name="fooid" type="xs:integer"
substitutionGroup="tns:cell"/>
<xs:element name="fooname" type="xs:string"
substitutionGroup="tns:cell"/>
The same patterns can be used to declare a different set of legal cells for table bar:
<xs:element name="barid" type="xs:positiveInteger"
substitutionGroup="tns:cell"/>
<xs:element name="barcounter" type="xs:double"
substitutionGroup="tns:cell"/>
<xs:element name="table-bar" substitutionGroup="tns:table">
<xs:complexType>
<xs:complexContent>
<xs:restriction base="tns:table">
<xs:sequence>
<xs:element ref="tns:row-bar"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:element>
<xs:element name="row-bar" substitutionGroup="tns:row">
<xs:complexType>
<xs:complexContent>
<xs:restriction base="tns:row">
<xs:sequence>
<xs:element ref="tns:barid"/>
<xs:element ref="tns:barcounter"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:element>
The situation you describe is one of the use cases for which abstract elements and substitution groups were designed. Other techniques which could also be used here (but which I won't illustrate in detail) include:
Declared subtypes, use of xsi:type (declare foo-table and bar-table as restrictions or extensions of type table, use <table xsi:type="tns:foo-table">...</table> or <table xsi:type="tns:bar-table">...</table> to guide validation)
Assertions (declare foo-table and bar-table types which extend the generic table type by adding assertions about the grandchildren -- this is an XSD 1.1 feature not available in 1.0).
Conditional type assignment (declare that table gets one type if it has name="foo" and a different type if it has name="bar" -- also an XSD 1.1 feature not available in 1.0).
There may be other ways to do it, too.

Reference another XML parent complexType?

How should I reference another complexType in xml, as element or as attribute over my own defined Key? What is the correct approach to model the following self-reference? Is the first approach even possible, or does it lead to infinite self-referencing?
<xs:complexType name="Category">
<xs:sequence>
<xs:element name="ParentCategory" type="Category" minOccurs="1" maxOccurs="1"></xs:element>
<xs:element name="ChildCategory" type="Category" minOccurs="0" maxOccurs="unbounded"></xs:element>
</xs:sequence>
<xs:attribute name="CategoryName" type="xs:string"></xs:attribute>
</xs:complexType>
or
<xs:complexType name="Category">
<xs:sequence>
<xs:element name="ChildCategory" type="Category" minOccurs="0" maxOccurs="unbounded"></xs:element>
</xs:sequence>
<xs:attribute name="CategoryName" type="xs:string"></xs:attribute>
<xs:attribute name="ParentCategory" type="xs:string"></xs:attribute>
</xs:complexType>
I'm a bit confused - since I want to be object oriented, but am not sure how this would look like in XML. Wouldn't the reference of ParentCategory as a Category-type require me to again write a Category-type in XML that itself has a ParentCategory child-element, etc... leading to infinite type-referencing.
There's no issue referencing an element of the same type as part of the type definition, so your first example is fine from that point of view. Trying to reference the parent is a bit odd though, you shouldn't really need to do this... XML is hierarchical after all.
<xs:complexType name="Category">
<xs:sequence>
<xs:element maxOccurs="unbounded" minOccurs="0" name="ChildCategory" type="Category"/>
</xs:sequence>
<xs:attribute name="CategoryName" type="xs:string"/>
</xs:complexType>
The Category type references itself recursively, allowing for 0 or more ChildCategory elements. This should do what you need (there's nothing wrong with recursive type referencing in the XML Schema).
If you need to refer to the parent Category in your document, it's easy enough to chain to the parent node in any DOM implementation or with XPath.

xml schema: restricting occurrences to sibling element sequence cardinality

Given:
<xs:complexType name="SymbolsList" final="">
<xs:sequence>
<xs:element name="symbol" maxOccurs="unbounded">
<xs:complexType>
<xs:attribute name="name" type="xs:string" />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ComboList">
<xs:sequence>
<xs:element name="combo" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="symbol" maxOccurs="unbounded">
<xs:complexType>
<xs:attribute name="name" type="xs:string" />
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="comboName" type="xs:string" />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:element name="symbolsList" type="SymbolsList">
<xs:unique name="uniqueSymbol">
<xs:selector xpath="./symbol" />
<xs:field xpath="#name" />
</xs:unique>
</xs:element>
<xs:element name="combosList" type="ComboList">
<xs:unique name="uniqueCombo">
<xs:selector xpath="./combo" />
<xs:field xpath="#comboName" />
</xs:unique>
</xs:element>
I believe this defines a list of symbols and a list of combinations of those symbols.
The each entry in the list of symbols must have a unique name, and each entry in the list of combos must have a unique comboName.
What I'd like to know is if there is a way for me to restrict the number of allowed occurrences in the combosList sequence to at least the number of symbols defined in the symbol list.
I guess I'm asking whether or not cardinality restriction can be variable and if so, how to associate it's limitation?
I also want to make it so that the comboList elements (a single combo) can only use names of symbols defined in the symbolList element.
I think I can pull of that last part. I can't find anything anywhere that talks about limiting caridinal sizes of disparate element sequences to greater than or equal to one or the other.
Perhaps it's not possible.
XSD requires cardinality to constraints to be specified literally in the declaration; the kind of dynamic calculation you have in mind is not in XSD's design space.
In XSD 1.1 you can add an assertion to some common ancestor of SymbolsList and CombosList that requires
count(CombosList/combo) ge count(SymbolsList/symbol)
XSD 1.1 is supported by Saxon EE and by Xerces J (in the latter case you have to look for the 1.1 distribution, or did last I looked). (One caveat: Note that Xerces J does not support all of XPath 2.0 in assertions, and I haven't actually checked to see whether this assertion is covered by the minimal subset of XPath XSD requires of conforming 1.1 implementations. Investigate further before sinking a lot of time here.)

XSD and plain text

I have a rest/xml service that gives me the following...
<verse-unit unit-id="38009001">
<marker class="begin-verse" mid="v38009001"/>
<begin-chapter num="9"/><heading>Judgment on Israel&apos;s Enemies</heading>
<begin-block-indent/>
<begin-paragraph class="line-group"/>
<begin-line/><verse-num begin-chapter="9">1</verse-num>The burden of the word of the <span class="divine-name">Lord</span> is against the land of Hadrach<end-line class="br"/>
<begin-line class="indent"/>and Damascus is its resting place.<end-line class="br"/>
<begin-line/>For the <span class="divine-name">Lord</span> has an eye on mankind<end-line class="br"/>
<begin-line class="indent"/>and on all the tribes of Israel,<footnote id="f1">
A slight emendation yields <i>
For to the <span class="divine-name">Lord</span> belongs the capital of Syria and all the tribes of Israel
</i>
</footnote><end-line class="br"/>
</verse-unit>
I used visual studio to generate a schema from this and used XSD.EXE to generate classes that I can use to deserialize this mess into programmable stuff.
I got everything to work and it is deserialized perfectly (almost).
The problem I have is with the random text mixed throughout the child nodes. The generated verse-unit objects gives me a list of objects (begin-line, begin-block-indent, etc), and also another list of string objects that represent the bits of string throughout the xml.
Here is my schema
<xs:element maxOccurs="unbounded" name="verse-unit">
<xs:complexType mixed="true">
<xs:sequence>
<xs:choice maxOccurs="unbounded">
<xs:element name="marker">
<xs:complexType>
<xs:attribute name="class" type="xs:string" use="required" />
<xs:attribute name="mid" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
<xs:element name="begin-chapter">
<xs:complexType>
<xs:attribute name="num" type="xs:unsignedByte" use="required" />
</xs:complexType>
</xs:element>
<xs:element name="heading">
<xs:complexType mixed="true">
<xs:sequence minOccurs="0">
<xs:element name="span">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="class" type="xs:string" use="required" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="begin-block-indent" />
<xs:element name="begin-paragraph">
<xs:complexType>
<xs:attribute name="class" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
<xs:element name="begin-line">
<xs:complexType>
<xs:attribute name="class" type="xs:string" use="optional" />
</xs:complexType>
</xs:element>
<xs:element name="verse-num">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:unsignedByte">
<xs:attribute name="begin-chapter" type="xs:unsignedByte" use="optional" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="end-line">
<xs:complexType>
<xs:attribute name="class" type="xs:string" use="optional" />
</xs:complexType>
</xs:element>
<xs:element name="end-paragraph" />
<xs:element name="end-block-indent" />
<xs:element name="end-chapter" />
</xs:choice>
</xs:sequence>
<xs:attribute name="unit-id" type="xs:unsignedInt" use="required" />
</xs:complexType>
</xs:element>
WHAT I NEED IS THIS. I need the random text that is NOT surrounded by an xml node to be represented by an object so I know the order that everything is in.
I know this is complicated, so let me try to simplify it.
<field name="test_field_0">
Some text I'm sure you don't want.
<subfield>Some text.</subfield>
More text you don't want.
</field>
I need the xsd to generate a field object with items that can have either a text object, or a subfield object. I need to no where the random text is within the child nodes.
You can try Xml Schema Mixed Content, which is well explained here: http://www.w3schools.com/schema/schema_complex_mixed.asp
I don't know much about the .net side. But this somewhat older article says that mixed mode is basically supported by xsd.exe: http://msdn.microsoft.com/en-us/magazine/cc164135.aspx
Well your problem starts here:
<xs:element name="begin-line">
<xs:complexType>
<xs:attribute name="class" type="xs:string" use="optional" />
</xs:complexType>
</xs:element>
What this means is that a "begin line" type has an attribute called class (Which means the tag can have an attribute class like so: <begin-line class="lineclass">. However it is simply a type xs:string which means that all you get is a string.
I also don't know if this is an option, but if your XML could be made to have closing tags like this line for instance:
<begin-line class="indent"/>and Damascus is its resting place.<end-line class="br"/>
XML should be like this:
<begin-line class="indent"/>and Damascus is its resting place.</begin-line class="br">
I believe that if all the "line" tags were closed properly then the XSD generator might have a better time trying to derive what is inside the "begin-line" XML tag. Indeed if this is possible then you could rename begin-line to line and begin-chapter to chapter which should make your XML much more readable.
If it's not possible to update your code, then you are going to have to try your best with the string itself. I'm not sure if verses contain pure HTML, but if so you could parse the string inside the begin-line element as XML itself, using the library to jump between values and nodes (you might have to wrap a pair of tags around the string before trying to parse it though).

Schema Issue: Can define element type OR add element attribute, but not both. I want both!

I've inherited the task of creating a schema for some XML which already exists - and IMHO is not the best that could have been done. The section giving me problems is the element at the end of the 'scan-result' element.
The best I'm hoping for with regard to the data in the 'spectrum' element is to treat it as type="xs:string". I'll programatically divide up the numeric pairs that constitute the data in the string later. (Even though this step would not be needed had the data been properly structured in the first place.)
Here's a similar piece of XML data to what I have to work with...
<scan-result>
<spectrum-index>0</spectrum-index>
<scan-index>2</scan-index>
<time-stamp>5609</time-stamp>
<tic>55510</tic>
<start-mass>22.0</start-mass>
<stop-mass>71.0</stop-mass>
<spectrum count="5">30,11352;31,360;32,16634;45,1161;46,26003</spectrum>
</scan-result>
The problem is, I can't seem to get a working definition for the 'spectrum' element that has the 'count' attribute and allows me to define the 'spectrum' element type as "xs:string".
What I would like is something like the following:
<xs:complexType name="ctypScanResult">
<xs:sequence>
<xs:element name="spectrum-index" type="xs:integer"/>
<xs:element name="scan-index" type="xs:integer"/>
<xs:element name="time-stamp" type="xs:integer"/>
<xs:element name="tic" type="xs:integer"/>
<xs:element name="start-mass" type="xs:float"/>
<xs:element name="stop-mass" type="xs:float"/>
<xs:element name="spectrum" type="xs:string">
<xs:complexType>
<xs:attribute name="count" type="xs:integer"/>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="count" type="xs:integer"/>
</xs:complexType>
The problem is that I can define the type of the 'spectrum' element as "xs:string" XOR I can define the anonymous 'xs:complexType' in the 'spectrum' element, which allows me to insert the 'count' attribute. But I need to be able to express both.
Given that I'm kind of stuck with the XML as it was handed to me, is there a schema definition that will allow me to describe this data?
Sorry this is long, but thanks to any and all who respond,
AlarmTripper
Followup: I know why the error occurs...
Quoted from W3C:
3.3.3 Constraints on XML Representations of Element Declarations
Schema Representation Constraint: Element Declaration Representation OK
In addition to the conditions imposed on element information items by the schema for schemas: all of the following must be true:
1 default and fixed must not both be present.
2 If the item's parent is not , then all of the following must be true:
2.1 One of ref or name must be present, but not both.
2.2 If ref is present, then all of , , , , , nillable, default, fixed, form, block and type must be absent, i.e. only minOccurs, maxOccurs, id are allowed in addition to ref, along with .
3 type and either or are mutually exclusive.
4 The corresponding particle and/or element declarations must satisfy the conditions set out in Constraints on Element Declaration Schema Components (§3.3.6) and Constraints on Particle Schema Components (§3.9.6).
But I'm still in the same fix I was before... How can I actually accomplish something that resembles my goal?
Thanks,
AlarmTripper
Let a tool do it for you! Try xsd.exe.
Or, if you must define by hand, at least check your hand-written-definition with an automatically generated one.
Here's what XSD.exe gave me for your input. I trimmed out some MS-NS cruft.
<xs:element name="spectrum">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="count" type="xs:string" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
You need to set the attribute mixed="true" on complexType:
<xs:element name="spectrum">
<xs:complexType mixed="true">
<xs:attribute name="count" type="xs:integer" />
</xs:complexType>
</xs:element>
EDIT: Okay, just read your comment, sorry. I believe the following should work instead:
<xs:element name="spectrum">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="count" type="xs:integer" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="spectrum" type="xs:string">
<xs:complexType>
<!-- ADD THIS NEXT LINE -->
<xs:complexContent mixed="true"/>
<xs:attribute name="count" type="xs:integer"/>
</xs:complexType>
</xs:element>

Resources