Semantically correct way to add a copyright notice into a svg file? - svg

I want to add a copyright notice in my svg files and it should be only "hidden" text and no watermark.
This is no real protection, because if you open a svg file with a text editor you can edit everything and delete the copyright. But I think this would be a simple and great way to show, who has made the file and a possible chance to find unlicensed graphics if there is some hidden information and if you are looking for it you can easily find it.
My main question is: how should the copyright text be put into the file?
<title> element is for accessibility purposes, some user agents display the title element as a tooltip.
<desc> element generally improves accessibility and you should describe what a user would see.
ugly way: a text element with inline CSS to hide it. Don't even think about this! :)
<!--Copyright info here--> could be also a simple solution.
<metadata>: this would the best way but I did not find a detailed definition and which child elements could live inside. Also https://developer.mozilla.org/en-US/DOM/SVGMetadataElement gives a 404.
Under https://www.w3.org/TR/SVG/struct.html#MetadataElement we can find more details. But is RDF really necessary?
I think a <metadata> element is the right place, but which child elements should be used and is just RDF the way to go?

I think the metadata element is the correct choice here. It has to contain XML, but it doesn’t have to be a RDF serialization (e.g., RDF/XML).
But I think it makes sense to use RDF here, because that’s exactly RDF’s job (providing metadata about resources, like SVG documents), and there is probably no other XML-based metadata language that has greater reach / better support.
A simple RDF statement (in RDF/XML) could look like this:
<metadata>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:schema="http://schema.org/">
<rdf:Description rdf:about="http://example.com/my-svg-file.svg">
<schema:license rdf:resource="https://creativecommons.org/licenses/by-sa/4.0/"/>
</rdf:Description>
</rdf:RDF>
</metadata>
The about attribute takes an IRI as value; for a stand-alone SVG document, you could provide an empty value (= the base IRI of the document).
In this example I use the license property from Schema.org:
A license document that applies to this content, typically indicated by URL.
(The vocabulary Schema.org is supported by several big search engines.)

Related

What is disadvantage of manipulating XML files directly as string?

In case I want to change the text or add an element in XML files, I can just directly convert the file to a string, replace or add elements as a string, then convert it back to XML.
In what use case where that approach is bad? Why do we need to manipulate it using libraries such as XMLdom, Xpath?
The disadvantage of manipulating XML via string operators is that achieving a parsing-dependent goal for even one particular XML document is already harder than using a proven XML parser. Achieving the goal for equivalent XML document variations will be nearly impossible, especially for anyone naive enough to be considering such an approach in the first place.
Not convinced?
Scan the table of contents of the Extensible Markup Language (XML) 1.0 (Fifth Edition), W3C Recommendation 26 November 2008. If you do not understand everything, your hand-written, poor imitation of an XML parser, will fail, if not on your first test case, on future variations which you're obligated to handle if you wish to claim your code works with XML. To mention just a few challenges, your program should
Report if its input XML is not well-formed.
Handle character and entity references.
Handle comments and CDATA sections.
Tempted to parse XML via string operators, including regex? Don't do it.
Use a real XML parser.

Is there any way to hide or obfuscate schema json-ld?

On my webpage I have a standard JSON-LD schema that holds A LOT of data. Is there any way to prevent or make it harder to read for an average user in the console?
Remove spacing and new lines. It has to stay machine readable, which I think means you can't obfuscate the actual text or property names.
I guess you could have it stored in another obfuscated format and have JavaScript generate the readable version. But then, anyone checking the rendered html will see it as it is. And it will limit the systems that can read it.
Another idea is to detect if it's a normal user and not provide the structured data to them. They don't need it. But that's cloaking and may annoy Google.
Don’t mark up content that is not visible to readers of the page
One of google Google structured data Quality guidelines is to give the users the content you describe in your JSON-LD (So the idea of hiding or make this data harder to read for "normal users" does not make sense).
Don’t mark up content that is not visible to readers of the page. For
example, if the JSON-LD markup describes a performer, the HTML body
should describe that same performer. Google Quality guidelines
https://developers.google.com/search/docs/guides/sd-policies
By the way, "normal/average users" won't inspect your HTML source code (And developers have nothing to do with this specific JSON-LD information either).
Protect-javascript
If you insist read topics related to "protect-javascript" (This issue not related to schema JSON-LD):
How can I obfuscate (protect) JavaScript?
How do I protect javascript files?
Protect your JavaScripts from "view source"

How to parse an XSD file with RapidXML

Does RapidXML have the capability to validate/parse a XML file with its associated schema, i.e. XSD file? I was under the assumption that an XML parser would have the capability to do both congruently. If not, why is it deemed unnecessary to validate/parse the associated schema? I checked RapidXML's documentation and found no mention of schema or xsd.
I am currently parsing XML files likeso:
rapidxml::file<> xmlFile("BeerLog.xml");
rapidxml::xml_document<> doc;
doc.parse<0>(xmlFile.data());
The following sudo-code might give you a better idea of what I am looking for:
rapidxml::file<> xmlFile("BeerLog.xml", "BeerLog.xsd");
or even:
rapidxml::file<> xmlFile("BeerLog.xml");
rapidxml::file<> xsdFile("BeerLog.xsd");
rapidxml::xml_document<> doc;
doc.parse_with_schema<0>(xsdFile.data(), xmlFile.data());
Your impression is wrong, accessing the content of a XML and validation against a scheme are quite distinct topics- even if the former is useful for the latter. Especially light-wight and fast parsers don't support validation, and a quick glance into the documentation shows this:
W3C Compliance. RapidXml is not a W3C compliant parser, primarily because it ignores DOCTYPE declarations
Given also, that there are quite different scheme languages (XSD, RNG, DTD, ...) even support of one would not mean its the one you would like to.
You will also have to take into account, that there are many XML files, which are just well-formed and do not conform to any scheme - somebody may want to process them nevertheless.

Gate- add annotation to entire document

I am trying to do document classification with gate. For that I need to annotate the entire document with one type of annotation. Can anyone please tell me how to do that?
Usually I use XML for that purpose. Something like:
<document class="class-1">
The text of you document 1 is here..
</document>
<document class="class-2">
The text of you document 2 is here..
</document>
Then save these xml as separated files (or as one document).
In GATE application you can use Annotation Set Transfer PR and move annotation from "Original markups" to default annotation set. This is one of the options. Other options depends on data format you have.
If your source documents are HTML or XML then there will already be an annotation in the Original markups set that spans all the content, otherwise the simplest option would be to load the Groovy plugin and use the scripting PR with a one-line script like
outputAS.add(doc.start(), doc.end(), "Document", Utils.featureMap())

block some part of web page to be indexed

I crawled a web site. there are a lot of common contents on the pages, like drop-down menu, navigation. How to prevent these contents from being indexed?
Not sure, if you still need to do this, but just in case you do, you could try blacklist_whitelist plug-in which can be found at https://issues.apache.org/jira/browse/NUTCH-585.
The plug-in allows you to have a list of the elements you want to either block or allow but not both.
for example:
<property>
<name>parser.html.blacklist</name>
<value>noscript,div,#footer</value>
<description>
A comma-delimited list of css like tags to identify the elements which should
NOT be parsed. Use this to tell the HTML parser to ignore the given elements, e.g. site navigation.
It is allowed to only specify the element type (required), and optional its class name ('.')
or ID ('#'). More complex expressions will not be parsed.
Valid examples: div.header,span,p#test,div#main,ul,div.footercol
Invalid expressions: div#head#part1,#footer,.inner#post
Note that the elements and their children will be silently ignored by the parser,
so verify the indexed content with Luke to confirm results.
Use either 'parser.html.blacklist' or 'parser.html.whitelist', but not both of them at once. If so,
only the whitelist is used.
</description>
</property>
I am working with nutch codebase since past 2 years and as far i have seen, this aint possible. Once the content enters the nutch segments, you cant strip off parts like drop-down menu, navigation etc from it and keep only the required stuff.
If you or anyone else knows how to do it (off course..without modifying the code), please share the same.

Resources