XML Schema Types - xsd

If I have an element, animal:
<animal name="dog"/>
Which can take the following as values to the name attribute:
dog
cat
bird
$(ANY_STRING)
Where $(ANY_STRING) is a simple name, value substitution that some software will perform and validate later (ANY_STRING being literally any string). What would the XML Schema for this element look like? Restricting on the three known names is easy enough:
<xs:simpleType name="AnimalNames">
<xs:restriction base="xs:string">
<xs:enumeration value="dog"/>
<xs:enumeration value="cat"/>
<xs:enumeration value="bird"/>
</xs:restriction>
</xs:simpleType>
Restricting on $(ANY_STRING) is similarly easy on its own (using xs:pattern to restrict). But since attributes may only be simple types, is it possible to specify that the attribute may be in the list of enumerations or the $(ANY_STRING) value?
Another option I've considered is restricting on the below pattern:
<xs:pattern value="dog|cat|bird|$(.*)"/>
Although that gets pretty nasty as the list of possible values grows.
Of course, the simplest option is to just declare a string type, but I'd like to be more restrictive than that.

One way to define semi-closed lists like this, with a set of well-known values explicitly specified for documentation, and then with an escape-hatch to allow other strings as well, is to define your attribute with a union of your AnimalNames type and xs:NMTOKEN or xs:Name (or whatever built-in or user-defined type best captures your constraints on the other names not enumerated).
As guidot points out in his comment, such a union accepts the same set of values as its most inclusive member type, so for schemas whose sole purpose is gatekeeping it's pointless. The technique is useful for documentation and for type-driven dispatch (if the validating member type is AnimalNames do X else do Y).
In order to accommodate schema validators that don't provide information about which member type validated a particular value of a union type, some vocabulary designers do as guidot suggests, and provide two element types, one for known / predicted / expected animal names and one for other names. Or they specify the type AnimalNames as accepting the strings dog, cat, bird, and other, define the attribute name as having type AnimalNames, and define another attribute (call it other-name) which is defined as having meaning if and only if name="other". So dogs are described using
<animal name="dog"/>
and hamsters with
<animal name="other" other-name="hamster"/>
That makes it fairly simple to handle the well known names specially while still accepting other names.

Related

OWL: only one property among many properties exists

I want to express the xs:choice element from XSD in OWL:
XML Schema choice element allows only one of the elements contained in the declaration to be present within the containing element.
I think maybe I should first define a property group in OWL, and then specify only one of the properties in the group is allowed exists. Any help?
I think maybe I should first define a property group in OWL, and then
specify only one of the properties in the group is allowed exists. Any
help?
There's no notion of "property group" in OWL, but you could get a similar effect using subproperties and disjoint properties. For instance, you could have a property hierarchy like this:
hasVehicleChoice
hasCar
hasTruck
Then, you can declare that hasCar and hasTruck are disjoint. That means that a individual can't have the same value for both properties. That means that you can't say:
x hasCar vechicle72
x hasTruck vechicle72
That's not enough to say that they can't have different values though. You could still have
x hasCar vechicle72
x hasTruck vechicle75
To avoid that, you could make hasVehicleChoice be a functional property (meaning each individual has 0 or 1 values for it, but no more), or use a subclass axiom with a restriction, like
Person subClassOf (hasVehicleChoice exactly 1)
Then, each person would have exactly one vehicle choice, and since hasCar and hasTruck are disjoint, the person can't have both.
All that said, this isn't a common pattern in OWL ontologies, and there's not a particularly convenient way of encoding it. If you don't need it all that often, you might be better off just using the subclass axioms and property restrictions directly. E.g.,
Person subClassOf ((hasCar exactly 1) and (hasTruck exactly 0)) or ((hasCar exactly 0) and (hasTruck exactly 1))

XML Schema for choice between element and #PCDATA

I have a preexisting XML document type that has an element that can have two content types: some elements, or just text. Modeling this as mixed content is overkill, and JAXB's XJC creates a very ugly binding as a result.
<bars><bar .../><bar .../></bars>
versus
<bars>Just a bunch of #PCDATA</bars>
xs:choice seems structured only for complex types (not simple types like xs:string). Is there a way to express this choice, between elements or text, using XML schema? In DTD this would be something like
<!ELEMENT bars (#PCDATA | bar*)>
The language you want to define (either a sequence of character or a sequence of bar elements, but not a mixture) cannot be defined in XSD 1.0 (or in XML DTDs, either; your DTD notation would make sense but is not legal in XML DTDs).
In XSD 1.1, you can use an assertion to ensure that if any bar elements are present as children, no text nodes occur (or only text nodes that contain only whitespace).
A simple way to achieve roughly the same effect is to say that the bars element contains either a sequence of bar elements or a single stringvalue element (call it whatever you like), where the stringvalue element contains -- as its name suggests -- just a string of characters.

Declaring an attribute for a different namespace in XML Schema

I've been using an XML format that is a mix of different existing formats and some custom elements and attributes, and I thought I should write a schema for those custom bits.
One thing I do is use custom attributes on elements in existing formats, like this:
<ns1:something attA="b" attB="a" ns2:extraAtt="c"/>
I understand that doing this is allowed but I cannot think how to declare my "extraAtt" in XML Schema or, worse, in a DTD.
I have tried reading the specification, but it could just as well be written in Chinese as far as I am concerned. Most tutorials talk only about "name", "type", and "use", e.g. this one and that one.
Each schema document defines components (pieces of a schema) for one namespace. So to define your attribute ns2:extraAtt you want a schema document something like this one:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://example.com/my-ns2">
<xs:attribute name="extraAtt" type="xs:anySimpleType"/>
</xs:schema>
The declaration of element ns1:something will need to allow for this attribute somehow, either with an attribute reference (<xs:attribute ref="ns2:extraAtt"/>) or with an attribute wildcard (<xs:anyAttribute namespace="http://example.com/my-ns2"/> or similar).
Sorry about the legibility of the spec; it's a long story, but essentially some members of the WG did not think people like you exist ("no one except implementors reads the spec, and as long as they don't complain it's readable enough" -- at least, that was what they said before some implementors did complain, loudly and bitterly; then they just changed the subject).
To declare just the attribute you can use this XSD:
<xs:schema
targetNamespace="theNamespaceUri"
elementFormDefault="qualified"
attributeFormDefault="qualified"
xmlns="theNamespaceUri"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:attribute name="extraAtt" type="xs:string">
</xs:attribute>
</xs:schema>
(assuming extraAtt is a simple string - you can use any type, or restrict an existing type etc.)

An XSD attribute to capture "source data field"

I have a domain model which is intended to generalise several source systems. As such, in certain cases the decision was made to overload data into new a generic field rather than to create several specific fields.
To account for this, when the source systems data is mapped onto the new domain model, I was hoping to record the source fieldname as an attribute, e.g.:
<Event>
<Description sourceField="subject">...</Description>
<Description sourceField="description">...</Description>
<Description sourceField="issue">...</Description>
<...>
</Event>
What would be the appropriate way to add such an attribute into the XSD? Would I need to specifically attach it to every such overloaded field, or is there a general way to allow an attribute across all elements?
Please don't point out that I should just add the extra fields into the domain model if I need to distinguish between the different data - the decision has been made, I just need to work around it!
Thanks in advance.
Not really.
If all your element declarations extend from a common base type definition, then you can add the attribute to the base.
If all your element declarations include an anyAttribute, you can make a global attribute definition for sourceField. Then the validator would at least allow your attribute but not require it. And if the anyAttribute is strict or lax the validator will make sure the attribute's content is valid.

referential integrity in XML files without globally unique IDs

Maybe I'm not seeing the forest for the trees, but here it goes:
I'm "designing" an XML document and have so far come up with something like the following:
<element key="root">
<data>...</data>
<elements>
<element key="foo">
<data>...</data>
</element>
<element key="bar">
<data>...</data>
</element>
</elements>
</element>
So it's a simple hierarchical structure. What I want to do now is have references from one element to any other element anywhere in the hierarchy. That would be trivial if each element had a unique ID, but they don't. So far I only plan on guaranteeing that each element's key is unique within its level (much like file names in a directory structure).
In other words, if I had fully qualified keys such as root.foo, guaranteeing referential integrity would be simple. But then I'd be storing redundant information (I already know that foo is a sub element of root, why store that information twice?).
I realize that this is essentially a cosmetic problem. One of the simplest solutions is probably to just auto-assign IDs and be done with it. But this is fairly inelegant (and error-prone unless you have a nice front end for editing the file), so I was hoping for a nicer way to do it.
Is there a way to implement this in XML Schema?
Use <xs:key> and <xs:keyref>
Keys are unique within specified context so they don't need to be globally unique like ID:s <xs:key> contains <xs:selector> element that specify the scope or context of the key (key value/s must be unique across this set) and <xs:field> element that defines the key nodes. A key can have multiple fields in which case their combination must be unique. <xs:key> and <xs:keyref> are used within an <xs:element> declaration.

Resources