XML Validation: XSD or Schematron? - xsd

I need to tighten validation on a moderately complex schema (SAML Metadata) comprising approx. 10 XSD files giving 1000 lines of schema definition. The validation should primarily require optional elements and attributes, restrict node and attribute contents to certain values, issue warnings on recommended or deprecated elements and attributes, and refuse anything that is not part of the schema, including stuff in extension elements of the original schema.
What is the best approach for this? Plain XSD, XSD + assertion, Schematron, or using some language and parse the document?

Related

How to count values in a flat file based on the spaces? [duplicate]

Since we can query on the XML file from C# (.NET), why do we need an XSD file? I know it is metadata file of particular XML file. We can specify the relationships in XSD, but what is its functioning then?
XML
<?xml version="1.0" encoding="utf-8" ?>
<Root>
<Customers>
<Customer CustomerID="GREAL">
<CompanyName>Great Lakes Food Market</CompanyName>
<ContactName>Howard Snyder</ContactName>
<ContactTitle>Marketing Manager</ContactTitle>
<Phone>(503) 555-7555</Phone>
<FullAddress>
<Address>2732 Baker Blvd.</Address>
<City>Eugene</City>
<Region>OR</Region>
<PostalCode>97403</PostalCode>
<Country>USA</Country>
</FullAddress>
</Customer>
</Customers>
<Orders>
<Order>
<CustomerID>GREAL</CustomerID>
<EmployeeID>6</EmployeeID>
<OrderDate>1997-05-06T00:00:00</OrderDate>
<RequiredDate>1997-05-20T00:00:00</RequiredDate>
<ShipInfo ShippedDate="1997-05-09T00:00:00">
<ShipVia>2</ShipVia>
<Freight>3.35</Freight>
<ShipName>Great Lakes Food Market</ShipName>
<ShipAddress>2732 Baker Blvd.</ShipAddress>
<ShipCity>Eugene</ShipCity>
<ShipRegion>OR</ShipRegion>
<ShipPostalCode>97403</ShipPostalCode>
<ShipCountry>USA</ShipCountry>
</ShipInfo>
</Order>
<Order>
<CustomerID>GREAL</CustomerID>
<EmployeeID>8</EmployeeID>
<OrderDate>1997-07-04T00:00:00</OrderDate>
<RequiredDate>1997-08-01T00:00:00</RequiredDate>
<ShipInfo ShippedDate="1997-07-14T00:00:00">
<ShipVia>2</ShipVia>
<Freight>4.42</Freight>
<ShipName>Great Lakes Food Market</ShipName>
<ShipAddress>2732 Baker Blvd.</ShipAddress>
<ShipCity>Eugene</ShipCity>
<ShipRegion>OR</ShipRegion>
<ShipPostalCode>97403</ShipPostalCode>
<ShipCountry>USA</ShipCountry>
</ShipInfo>
</Order>
</Orders>
</Root>
I want to get data from the Order elements according to a provided CustomerID.
Also: What is the purpose of giving the relationships in XSD?
XSD files are used to validate that XML files conform to a certain format.
In that respect they are similar to DTDs that existed before them.
The main difference between XSD and DTD is that XSD is written in XML and is considered easier to read and understand.
Without XML Schema (XSD file) an XML file is a relatively free set of elements and attributes. The XSD file defines which elements and attributes are permitted and in which order.
In general XML is a metalanguage. XSD files define specific languages within that metalanguage. For example, if your XSD file contains the definition of XHTML 1.0, then your XML file is required to fit XHTML 1.0 rather than some other format.
You mention C# in your question so it may help to think of as XSD as serving a similar role to a C# interface.
It defines what the XML should 'look like' in a similar way that an interface defines what a class should implement.
XSDs constrain the vocabulary and structure of XML documents.
Without an XSD, an XML document need only follow the rules for being well-formed as given in the W3C XML Recommendation.
With an XSD, an XML document must adhere to additional constraints placed upon the names and values of its elements and attributes in order to be considered valid against the XSD per the W3C XML Schema Recommendation.
XML is all about agreement, and XSDs provide the means for structuring and communicating the agreement beyond the basic definition of XML itself.
Also questions is: What is the purpose
of giving the relationships in xsd.
Suppose you want to generate some XML for an external party's tool, or similar - how would you know what structure it is allowed to follow to be used correctly for their tool? you write to a schema. Likewise if you want other people to use your tool, you would write a schema for them to follow. It may also be useful for validating your own XML.
Before understanding the XSD(XML Schema Definition) let me explain;
What is schema?
for example; emailID: peter#gmail
You can identify the above emailID is not valid because there is no #, .com or .net or .org.
We know the email schema it looks like peter#gmail.com.
Conclusion: Schema does not validate the data, It does the validation of structure.
XSD is actually one of the implementation of XML Schema. others we have relaxng
We use XSD to validate XML data.
An XSD is a formal contract that specifies how an XML document can be formed. It is often used to validate an XML document, or to generate code from.
An XSD file is an XML Schema Definition and it is used to provide a standard method of checking that a given XML document conforms to what you expect.
An .xsd file is called an XML schema. Via an XML schema, we may require a certain structure in a given XML - which elements in which order, how many times, with which attributes, how they are nested, etc. If we have a schema for our XML input, we can verify that it contains the data we need it to contain, and nothing else, with a few lines invoking a schema validator.
The xsd file is the schema of the xml file - it defines which elements may occur and their restrictions (like amount, order, boundaries, relationships,...)

Difference between XSD Simple element and XSD Complex element

I Googled this Question but still i'm unable to find the best difference for the Simple XSD (XML Schema Definition) Element and Complex XSD Element.
Any guidance would be highly appreciated.
I have no idea, why I answer this. But...
To summarize,
simple types can only have content directly contained between the element’s opening and closing tags. They cannot have attributes or child elements.
complex types can have attributes, can contain other elements, can contain a mixture of elements and text, etc etc.
One is a single value and the other a compound value.

How to include an XML schema (.xsd) in a JSON schema?

I want to define a JSON API response using JSON Schema.
Embedded in part of the API response is a complete, well-formed, schema valid XML string. The XSD of this XML string is a given.
Two part question:
How do I include the XSD in the JSON Schema such that the JSON Schema will also require the XML string to be schema valid in order for the whole API response to be valid?
If this is not possible, does anyone have another suggestion how to include the XSD at least in the specification? I'm working in RAML 0.8.
How do I include the XSD in the JSON Schema such that the JSON Schema
will also require the XML string to be schema valid in order for the
whole API response to be valid?
You cannot. The only thing you can do is validate the JSON and then at a later point extract the XML and validate it separately.
If this is not possible, does anyone have another suggestion how to
include the XSD at least in the specification? I'm working in RAML
0.8.
I've only used Swagger, not RAML. Swagger is also based on JsonSchema.
The only thing you can do here is to include a detailed specification that the contained XML should be compliant against such-and-such an XSD. You can do this by using the "description" functionality in swagger (or equivalent if it exists in RAML). This allows you to create a description (which supports markdown) and attach it to any element in the definition

XML Schema (XSD) - if one element has specific value then another element must be present and vice versa

Can I express this in an XSD?
For example:
One element is a required bool element named EmployedMoreThanThirteenWeeks and if the value is set to false I want the schema to require the existence of another element named EmploymentDate. And the other way around if the value is true then ideally the EmploymentDate element should be denied but I can accept it being optional.
No. An XSD just defines structure and data types, not relations. It is possible to add a key reference between elements but that won't prevent invalid nodes, just invalid values.
You can create an XSLT file (an XML Stylesheet) which will validate the XML file for you and thus generate a report of errors.
I think that XSD CANT do that, because the schemas verifies just an STRUCTURE (tree), and not VALUES (though you can check the value format).
You should consider other validation ways.

How to get an ordered list parsed by XML parser?

I am using a xsd schema file; there I specified an ordered list.
When parsing an XML node of the kind...
<myOrderedList> "element_1" "element_2" "element_3" </myOrderedList>
(which is valid XML syntax)
...all XML parsers I know parse this as a single node element.
Is there a way to get the XML parser parse this list for me (return it as a list or an array or whatever) or do I always have to parse it myself?
Why not make use of XML's ability to structure your data, and put each element in it's own XML element ? e.g.
<myOrderedList>
<element>1</element>
<element>2</element>
<element>3</element>
</myOrderedList>
etc. Otherwise you're having to implement parsing (albeit in a simple fashion) on top of the parsing effort that the XML parser is performing for you ?
If you do the above, the parser will return you the ordered list without any further work, and/or you can process it more easily using standard XML tooling like XSLT/XQuery etc.
Values in a list type are always separated by whitespaces so you can easily pass that through a tokenizer to get the list of values.
There are technologies like XSLT 2.0 schema aware that will see the list of values for such an element.
Using elements as proposed in the other answer is also a solution and may ease your processing. In XML child nodes are ordered so you should not worry about that. A possible representation would be:
<myOrderedList>
<value>element_1</value>
<value>element_2</value>
<value>element_3</value>
</myOrderedList>

Resources