Reading DTDs as Reference Texts - reference

Is it a correct, quick starting point to read the actual plaintexts that DTDs contain like the following, for a human-readable reference of all valid tags and properties for each specific DOCTYPE?
DTD for XHTML 1.0 Transitional
It seems to me that it becomes sort of a down-to-earth reference by doing so, but does it tell me "at run time" everything that is valid for a DOCTYPE, and as well to the browser program?
After that, what is a better reference? Where are the actual DOCTYPE standards, or are they the same as the general HTML standards? DOM level standards? W3Schools? Some other manual like "The HTML Reference Library 4.0" but for "newer" DOCTYPEs/HTML?

The DTD just describes the lexical structure of each DOCTYPE, it doesn't explain the semantics of anything, e.g. it doesn't say what <table> does. For that, you should go to the HTML specifications. The HTML 4.0 spec is at http://www.w3.org/TR/html4, HTML5 is http://www.w3.org/TR/html5/.

Related

DocBook 5.x: bibliography citation with extra text, or completely custom text?

I have a book with bibliography like
<bibliography>
<biblioentry>
<abbrev>A</abbrev>
<title>This is the book title</title>
</biblioentry>
<!-- ... -->
</bibliography>
and can cite individual works with <citation>A</citation>, which will output something like [A] in the resulting HTML.
Now, the citation often includes precise location within the work (such as volume/chapter/page/paragraph number, or their range). I currently have this part in the following text (like <citation>A</citation><phrase>, XX–XXI</phrase>) so I get [A], XX-XXI in the resulting text, but the XX–XXI part is semantically not related to the citation.
How can I make either a citation with text affixed to the reference abbreviation (something like <citation>A<loc>XX–XXI</loc></citation> → [A, XX–XXI]), or citation with completely custom text (but hyperlink resolved to the bibliography entry)?
I've been browsing DocBook 5.2: The Definitive Guide, grepping through xslTNG stylesheets and unit tests, and still unsure what to do. Perhaps <link ...> with some xref pointing to bibliography or something similar? Pointers appreciated.

Documentation or reference for "NETSCAPE-Bookmark-file-1" DOCTYPE

Is there any standard (possibly created after-the-fact) that governs <!DOCTYPE NETSCAPE-Bookmark-file-1> files? If you export bookmarks from either Chrome or Firefox (tried on Windows 10) you get this kind of file, which seems to be HTML of sorts.
I've tried searching the web but found only pragmatic results like parsers in specific programming stacks, or tips and tricks on importing and exporting it.
Is there any standard, RFC, format description, or reference parser, or something similar?
Not even valid HTML it is, neither technically, nor semantically. And it seems that modern browsers interpret the factual standard loosely when writing such files, but luckily also when importing.
The best available format description (probably reverse engineered, yes) seems to be this one:
https://learn.microsoft.com/en-us/previous-versions/windows/internet-explorer/ie-developer/platform-apis/aa753582(v=vs.85)
And it's by Microsoft of all things...

How to parse an XSD file with RapidXML

Does RapidXML have the capability to validate/parse a XML file with its associated schema, i.e. XSD file? I was under the assumption that an XML parser would have the capability to do both congruently. If not, why is it deemed unnecessary to validate/parse the associated schema? I checked RapidXML's documentation and found no mention of schema or xsd.
I am currently parsing XML files likeso:
rapidxml::file<> xmlFile("BeerLog.xml");
rapidxml::xml_document<> doc;
doc.parse<0>(xmlFile.data());
The following sudo-code might give you a better idea of what I am looking for:
rapidxml::file<> xmlFile("BeerLog.xml", "BeerLog.xsd");
or even:
rapidxml::file<> xmlFile("BeerLog.xml");
rapidxml::file<> xsdFile("BeerLog.xsd");
rapidxml::xml_document<> doc;
doc.parse_with_schema<0>(xsdFile.data(), xmlFile.data());
Your impression is wrong, accessing the content of a XML and validation against a scheme are quite distinct topics- even if the former is useful for the latter. Especially light-wight and fast parsers don't support validation, and a quick glance into the documentation shows this:
W3C Compliance. RapidXml is not a W3C compliant parser, primarily because it ignores DOCTYPE declarations
Given also, that there are quite different scheme languages (XSD, RNG, DTD, ...) even support of one would not mean its the one you would like to.
You will also have to take into account, that there are many XML files, which are just well-formed and do not conform to any scheme - somebody may want to process them nevertheless.

Semantically correct way to add a copyright notice into a svg file?

I want to add a copyright notice in my svg files and it should be only "hidden" text and no watermark.
This is no real protection, because if you open a svg file with a text editor you can edit everything and delete the copyright. But I think this would be a simple and great way to show, who has made the file and a possible chance to find unlicensed graphics if there is some hidden information and if you are looking for it you can easily find it.
My main question is: how should the copyright text be put into the file?
<title> element is for accessibility purposes, some user agents display the title element as a tooltip.
<desc> element generally improves accessibility and you should describe what a user would see.
ugly way: a text element with inline CSS to hide it. Don't even think about this! :)
<!--Copyright info here--> could be also a simple solution.
<metadata>: this would the best way but I did not find a detailed definition and which child elements could live inside. Also https://developer.mozilla.org/en-US/DOM/SVGMetadataElement gives a 404.
Under https://www.w3.org/TR/SVG/struct.html#MetadataElement we can find more details. But is RDF really necessary?
I think a <metadata> element is the right place, but which child elements should be used and is just RDF the way to go?
I think the metadata element is the correct choice here. It has to contain XML, but it doesn’t have to be a RDF serialization (e.g., RDF/XML).
But I think it makes sense to use RDF here, because that’s exactly RDF’s job (providing metadata about resources, like SVG documents), and there is probably no other XML-based metadata language that has greater reach / better support.
A simple RDF statement (in RDF/XML) could look like this:
<metadata>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:schema="http://schema.org/">
<rdf:Description rdf:about="http://example.com/my-svg-file.svg">
<schema:license rdf:resource="https://creativecommons.org/licenses/by-sa/4.0/"/>
</rdf:Description>
</rdf:RDF>
</metadata>
The about attribute takes an IRI as value; for a stand-alone SVG document, you could provide an empty value (= the base IRI of the document).
In this example I use the license property from Schema.org:
A license document that applies to this content, typically indicated by URL.
(The vocabulary Schema.org is supported by several big search engines.)

specification/implementation behaviour for empty href?

I once read a page a few years ago about the various browsers' differing implementations of behaviour when a link with an empty href is clicked.
some of them linked to the directory (/path/to/file?query → /path/to/)
some of them linked to the exact same URI (/path/to/file?query → /path/to/file?query)
some of them linked to the same page (/path/to/file?query → /path/to/file)
...and various other behaviours.
Is the behaviour defined in a specification?
If so, what is the correct behaviour?
If so, have the latest versions of the big five browsers today fixed their implementations?
Since there's no "specification" for contents of HREF (at least in HTML 4), the browsers can do whatever they damn well please.
UPDATE However, aside from HTML, there's an RFC3986: Uniform Resource Identifier (URI): Generic Syntax. It has section 4.4. Same-Document Reference which says:
When a URI reference refers to a URI that is, aside from its fragment
component (if any), identical to the base URI (Section 5.1), that
reference is called a "same-document" reference. The most frequent
examples of same-document references are relative references that are empty ...
I do not necessarily read the above as "an empty URI MUST cause the client to reload the same socument's URI", but it does sound like a "best practice" type of wording; so if I was implementing my own browser I'd almost certainly follow such a behavior.
On a related note, here's a good recent 3/2010) roundup of how browsers treat empty src attribute of <img> tag: http://www.nczonline.net/blog/2010/03/16/empty-string-urls-in-html-a-followup/ and http://www.nczonline.net/blog/2010/07/13/empty-string-urls-browser-update/ . Please note that it is a big deal, since having and empty img src would cause the page to endlessly re-load itself in the worst case scenario.

Resources