bnf/ebnf for xml schema - xsd

I'm looking for a BNF/EBNF of XML Schema.
I just found the one for XML (http://www.w3.org/TR/REC-xml or extracted at http://www.jelks.nu/XML/xmlebnf.html).
Well it's a starting point, but I'm curious that I couldn't find a more specific one for XML Schema.

I guess because nobody finds that useful, and it would be too complex. If somebody want to define an XML language, such as XML Schema, they would probably use XML primitives like elements or attributes (using XML Schema, Relax NG, DTD, etc.), not characters. One of the reasons XML was invented is to have a meta language for creating other languages.

I think the fiirst step would be to start with the xsd for XML Schema and to use Colibri to generate a bnf grammar.
I will check when i m back home. The Colibri's author say:
Il s'agit là d'un premier jet que nous pourrions affiner.
But i definitevly think it got potential.

Related

What is disadvantage of manipulating XML files directly as string?

In case I want to change the text or add an element in XML files, I can just directly convert the file to a string, replace or add elements as a string, then convert it back to XML.
In what use case where that approach is bad? Why do we need to manipulate it using libraries such as XMLdom, Xpath?
The disadvantage of manipulating XML via string operators is that achieving a parsing-dependent goal for even one particular XML document is already harder than using a proven XML parser. Achieving the goal for equivalent XML document variations will be nearly impossible, especially for anyone naive enough to be considering such an approach in the first place.
Not convinced?
Scan the table of contents of the Extensible Markup Language (XML) 1.0 (Fifth Edition), W3C Recommendation 26 November 2008. If you do not understand everything, your hand-written, poor imitation of an XML parser, will fail, if not on your first test case, on future variations which you're obligated to handle if you wish to claim your code works with XML. To mention just a few challenges, your program should
Report if its input XML is not well-formed.
Handle character and entity references.
Handle comments and CDATA sections.
Tempted to parse XML via string operators, including regex? Don't do it.
Use a real XML parser.

How to parse an XSD file with RapidXML

Does RapidXML have the capability to validate/parse a XML file with its associated schema, i.e. XSD file? I was under the assumption that an XML parser would have the capability to do both congruently. If not, why is it deemed unnecessary to validate/parse the associated schema? I checked RapidXML's documentation and found no mention of schema or xsd.
I am currently parsing XML files likeso:
rapidxml::file<> xmlFile("BeerLog.xml");
rapidxml::xml_document<> doc;
doc.parse<0>(xmlFile.data());
The following sudo-code might give you a better idea of what I am looking for:
rapidxml::file<> xmlFile("BeerLog.xml", "BeerLog.xsd");
or even:
rapidxml::file<> xmlFile("BeerLog.xml");
rapidxml::file<> xsdFile("BeerLog.xsd");
rapidxml::xml_document<> doc;
doc.parse_with_schema<0>(xsdFile.data(), xmlFile.data());
Your impression is wrong, accessing the content of a XML and validation against a scheme are quite distinct topics- even if the former is useful for the latter. Especially light-wight and fast parsers don't support validation, and a quick glance into the documentation shows this:
W3C Compliance. RapidXml is not a W3C compliant parser, primarily because it ignores DOCTYPE declarations
Given also, that there are quite different scheme languages (XSD, RNG, DTD, ...) even support of one would not mean its the one you would like to.
You will also have to take into account, that there are many XML files, which are just well-formed and do not conform to any scheme - somebody may want to process them nevertheless.

textual syntax for domain models

we have domain models described in some xml format. Given the domain models I want to generate tooling that helps the testers/domain experts to express data in text (and a domain specific test framework later). IDE support is mandatory (IDEA or eclipse).
say, i have this pseudo model
User
fn string 120 chars mandatory
ln string 120 chars mandatory
address not-mandatory
Address
street mandatory
city mandatory
A typical usage scenario:
user opens the IDE
creates a new file
when content assist invoked, should give options 'user', 'address' etc
If I choose user, furthur ctrl-space should give 'fn', 'ln', 'address' as options.
I know this can be done by xtext or jetbrains mps etc. But, I want to understand which technology lends for the following requirements.
the models are fed to the system at run time (new, updates, deletes etc).
so, I cannot have static set of grammars. How can I structure it so that the model/property assist is resolved at run time or at least the grammar is generated (may be a part of it)
when I am working with one set of 'grammars' , if I point my target server to a different version (which may have different set of models) , I want the editor validate my existing files and flag errors.
I get the data files in xml, text or via server lookups.
It is very important for me to transform the models into some other format or interpret them in java/groovy.
for ex,
I may have the following data file
user {
fn : Tom
ln : Jill
hobby : movies
}
but, when I validate this file against a server which does not know 'hobby' property, I want the editor to mark error on that property.
I have plans to add much more functionality to this dsl/toolkit.
Any hints which technology is more suitable ?
thanks
I know this can be done by xtext or jetbrains mps etc. But, I want to understand which technology lends for the following requirements.
I think Xtext is good for your requirements under the condition that you have (or can create) an XML schema for your XML domain models.
the models are fed to the system at run time (new, updates, deletes etc). so, I cannot have static set of grammars. How can I structure it so that the model/property assist is resolved at run time or at least the grammar is generated (may be a part of it)
If I understand you correctly, you don't really need specific grammar rules for each XML data model but only cross references to the data model.
EMF has support for generating EMF Java classes from XSD files and Xtext can reference XML files conforming to the XSD schema if you add them to the Xtext index using your custom indexer (Xtext interface IDefaultResourceDescriptionStrategy).
So you can create a normal Xtext project with grammar etc. for your DSL and use cross references that refer to your XML domain model.
when I am working with one set of 'grammars' , if I point my target server to a different version (which may have different set of models) , I want the editor validate my existing files and flag errors.
I get the data files in xml, text or via server lookups.
EMF uses URIs to identify resources so if you generate an Ecore model like I described, it should be possible to import the XML domain models using http:// or file:// (or whatever, it's extensible) URIs, or something that you internally resolve to URIs.
It is very important for me to transform the models into some other format or interpret them in java/groovy.
Here you have the choice between making an interpreter, an Xbase inferrer or a generator (each of which can be implemented well using Xtend), depending on your requirements.
(Disclaimer: I am an employee at itemis, which is one of the main contributors to Xtext)

Missing docbookxi.xsd

Seems that the /docbook-5.0/catalog.xml (XML catalog) found in the DocBook 5.0 zip ...
http://www.docbook.org/xml/5.0/docbook-5.0.zip
references a xsd/docbookxi.xsd schema file that seems to be missing from that archive.
Is this just a placeholder for some functionality that is yet to exist, or is this a legitimate error/bug/oversight in that catalog file?
Doing some google searching for docbookxi.xsd just turns up hundreds of references to this DocBook xml catalog reference, but no reference to the actual docbookxi.xsd file / definition.
Due to limitations in the environment i'm working with, I cannot use the alternative RELAX NG versions of this schema.
Yes, that's a legitimate bug. I don't actually recall if the toolchain that built the (awful) XSD versions was ever able to produce the XInclude version.
I can try to create an XInclude version of the "by hand" XSD files. However, that will be a version 1.1 XML Schema. Is that good enough, or do you need 1.0?
[Addendum]
After some investigation, it appears to me that the UPA rule in XSD makes creating an XInclude version enormously difficult. Simply allowing XInclude at either the division level (part or reference) and the component level (preface, chapter, et. al.) violates the UPA rule because a book can start with either a division or a component.
Perhaps http://docbook.org/xsd/5.0b2/docbook-xsd10.xsd or http://docbook.org/xsd/5.0b2/docbook.xsd is what you are looking for? Since the XSD schema documents for Docbook are now maintained by hand, it may well be that there are versions of the normative Relax NG schema for which no corresponding XSD schema document is provided.

All mandatory field in a xsd file?

Is there a quick way to find out all the mandatory field in a xsd file?
I need to quickly see all the mandatory fields in the schema
thanks
Not sure if you're looking to do this through code. If not, Altova XMLSpy, for example, provides an option to "Generate Sample XML File" - with options to generate only mandatory fields.
Otherwise, if you're working with Java, for example, you can use something like the Eclipse XSD project for programmatic access to the XSD. (It even works without Eclipse.) Some additional details at Are there any other frameworks that parse XSD other than XSOM? .
Take a look at this post; instead of exporting all fields, there's also an option to get only the mandatory ones... One significant difference compared with the answer you accepted is in that you can also generate an Excel or CSV file, in addition to the XML file; not to mention that the sample XML approach is deficient by definition... I would pay attention to the way mandatory choices, abstract typed elements or abstract elements with substitution groups play in your case.

Resources