How to parse an XSD file with RapidXML

How to parse an XSD file with RapidXML - xsd

Does RapidXML have the capability to validate/parse a XML file with its associated schema, i.e. XSD file? I was under the assumption that an XML parser would have the capability to do both congruently. If not, why is it deemed unnecessary to validate/parse the associated schema? I checked RapidXML's documentation and found no mention of schema or xsd.
I am currently parsing XML files likeso:
rapidxml::file<> xmlFile("BeerLog.xml");
rapidxml::xml_document<> doc;
doc.parse<0>(xmlFile.data());
The following sudo-code might give you a better idea of what I am looking for:
rapidxml::file<> xmlFile("BeerLog.xml", "BeerLog.xsd");
or even:
rapidxml::file<> xmlFile("BeerLog.xml");
rapidxml::file<> xsdFile("BeerLog.xsd");
rapidxml::xml_document<> doc;
doc.parse_with_schema<0>(xsdFile.data(), xmlFile.data());

Your impression is wrong, accessing the content of a XML and validation against a scheme are quite distinct topics- even if the former is useful for the latter. Especially light-wight and fast parsers don't support validation, and a quick glance into the documentation shows this:
W3C Compliance. RapidXml is not a W3C compliant parser, primarily because it ignores DOCTYPE declarations
Given also, that there are quite different scheme languages (XSD, RNG, DTD, ...) even support of one would not mean its the one you would like to.
You will also have to take into account, that there are many XML files, which are just well-formed and do not conform to any scheme - somebody may want to process them nevertheless.

Related

Documentation or reference for "NETSCAPE-Bookmark-file-1" DOCTYPE

Is there any standard (possibly created after-the-fact) that governs <!DOCTYPE NETSCAPE-Bookmark-file-1> files? If you export bookmarks from either Chrome or Firefox (tried on Windows 10) you get this kind of file, which seems to be HTML of sorts.
I've tried searching the web but found only pragmatic results like parsers in specific programming stacks, or tips and tricks on importing and exporting it.
Is there any standard, RFC, format description, or reference parser, or something similar?

Not even valid HTML it is, neither technically, nor semantically. And it seems that modern browsers interpret the factual standard loosely when writing such files, but luckily also when importing.
The best available format description (probably reverse engineered, yes) seems to be this one:
https://learn.microsoft.com/en-us/previous-versions/windows/internet-explorer/ie-developer/platform-apis/aa753582(v=vs.85)
And it's by Microsoft of all things...

Is there anyway to sanitize SVG file in c#, any libraries anything?

Is there anyway to sanitize SVG file in c#, any libraries anything?
From client side we are sanitizing the SVG files while uploading , but the security team is asking for a sanitization in serverside too.

I'm primarily a Python developer, but I thought I'd throw some research into the issue for ya. I used to develop for C, so I thought I should at least have a basic understanding of what's going on.
*.SVG files are structured like XML documents, and use the HTML DOM to access JavaScript and CSS functionalities. Trying to enumerate and script out every single catch for potential JavaScript-based security issues doesn't seem realistic, so personally, I'd just entirely remove all JavaScript sectors that do anything more than define simple variables, do math operations, or reference already-defined visual elements from any uploaded *.SVG files.
Since *.SVG files are based on XML and are human-readable, this could be accomplished by iterating through the file either line-by-line like a text file or element-by-element like an XML or HTML file. You'd want to go through and remove all the JavaScript scripts that don't meet the above criteria, save it & then convert it to XML and use a standard XML-sanitation library on it, and then convert that back to *.SVG. I reckon this Github library and this StackOverflow thread could be helpful in that.
I hope my response was helpful!

It is true what your security team say: client-side security is not security. It is just user convenience. Never rely on client-side checks. Anyone wanting to do bad things to your application will bypass client-side checks first thing.
Now, a SVG file can be used in a XSS attack only by leveraging the <script> tag.
Unfortunately, defusing/securing a script is a very complicated topic and prone to errors and both false positives and negatives.
So, I believe your only recourse is to remove scripts altogether. This might not be what you need.
But, if it is, then it's very simple to do. The script tag cannot be masqueraded inside the SVG, or the browser will not recognize it in the first place, making the attack moot. So a simple regex should suffice. Something like,
cleanSVGcode = Regex.Replace(
userSVGcode,
#"<script.*?script>",
#"",
RegexOptions.IgnoreCase|RegexOptions.SingleLine
);
It is possible to sanitize out further sequences. Since, if they're written incorrectly or in an obfuscated way, javascript calls won't work, the number of these sequences is limited.
#"javascript:" => #"syntax:error:"

What is disadvantage of manipulating XML files directly as string?

In case I want to change the text or add an element in XML files, I can just directly convert the file to a string, replace or add elements as a string, then convert it back to XML.
In what use case where that approach is bad? Why do we need to manipulate it using libraries such as XMLdom, Xpath?

The disadvantage of manipulating XML via string operators is that achieving a parsing-dependent goal for even one particular XML document is already harder than using a proven XML parser. Achieving the goal for equivalent XML document variations will be nearly impossible, especially for anyone naive enough to be considering such an approach in the first place.
Not convinced?
Scan the table of contents of the Extensible Markup Language (XML) 1.0 (Fifth Edition), W3C Recommendation 26 November 2008. If you do not understand everything, your hand-written, poor imitation of an XML parser, will fail, if not on your first test case, on future variations which you're obligated to handle if you wish to claim your code works with XML. To mention just a few challenges, your program should
Report if its input XML is not well-formed.
Handle character and entity references.
Handle comments and CDATA sections.
Tempted to parse XML via string operators, including regex? Don't do it.
Use a real XML parser.

Missing docbookxi.xsd

Seems that the /docbook-5.0/catalog.xml (XML catalog) found in the DocBook 5.0 zip ...
http://www.docbook.org/xml/5.0/docbook-5.0.zip
references a xsd/docbookxi.xsd schema file that seems to be missing from that archive.
Is this just a placeholder for some functionality that is yet to exist, or is this a legitimate error/bug/oversight in that catalog file?
Doing some google searching for docbookxi.xsd just turns up hundreds of references to this DocBook xml catalog reference, but no reference to the actual docbookxi.xsd file / definition.
Due to limitations in the environment i'm working with, I cannot use the alternative RELAX NG versions of this schema.

Yes, that's a legitimate bug. I don't actually recall if the toolchain that built the (awful) XSD versions was ever able to produce the XInclude version.
I can try to create an XInclude version of the "by hand" XSD files. However, that will be a version 1.1 XML Schema. Is that good enough, or do you need 1.0?
[Addendum]
After some investigation, it appears to me that the UPA rule in XSD makes creating an XInclude version enormously difficult. Simply allowing XInclude at either the division level (part or reference) and the component level (preface, chapter, et. al.) violates the UPA rule because a book can start with either a division or a component.

Perhaps http://docbook.org/xsd/5.0b2/docbook-xsd10.xsd or http://docbook.org/xsd/5.0b2/docbook.xsd is what you are looking for? Since the XSD schema documents for Docbook are now maintained by hand, it may well be that there are versions of the normative Relax NG schema for which no corresponding XSD schema document is provided.

All mandatory field in a xsd file?

Is there a quick way to find out all the mandatory field in a xsd file?
I need to quickly see all the mandatory fields in the schema
thanks

Not sure if you're looking to do this through code. If not, Altova XMLSpy, for example, provides an option to "Generate Sample XML File" - with options to generate only mandatory fields.
Otherwise, if you're working with Java, for example, you can use something like the Eclipse XSD project for programmatic access to the XSD. (It even works without Eclipse.) Some additional details at Are there any other frameworks that parse XSD other than XSOM? .

Take a look at this post; instead of exporting all fields, there's also an option to get only the mandatory ones... One significant difference compared with the answer you accepted is in that you can also generate an Excel or CSV file, in addition to the XML file; not to mention that the sample XML approach is deficient by definition... I would pay attention to the way mandatory choices, abstract typed elements or abstract elements with substitution groups play in your case.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to parse an XSD file with RapidXML - xsd

Related

Documentation or reference for "NETSCAPE-Bookmark-file-1" DOCTYPE

Is there anyway to sanitize SVG file in c#, any libraries anything?

What is disadvantage of manipulating XML files directly as string?

Missing docbookxi.xsd

All mandatory field in a xsd file?

Categories

Resources