We are considering allowing user uploaded SVGs in our web app. We've been hesitant to do this before, due to a large number of complex vulnerabilities that we know exist in untrusted SVGs. A coworker found the --vacuum-defs option to Inkscape, and believes that it renders all untrusted SVGS safe for processing.
According to the manpage, that option "Removes all unused items from the section of the SVG file. If this option is invoked in conjunction with --export-plain-svg, only the exported file will be affected. If it is used alone, the specified file will be modified in place." However, according to my coworker, "Scripting is removed, XML transformations are removed, malformations are not tolerated, encoding is removed and external imports are removed.
Is this true? If so, is it enough that we should feel safe accepting untrusted SVGs? Is there any other preprocessing we should do?
As I understand it, the main concern of serving untrusted SVGs is the fact that SVG files can contain Javascript. This is obvious for SVG because embedded javascript is part of the format, but it can happen with every type of uploaded file if the browser is not careful.
Therefore, and even though modern browsers do not execute scripts found in the < img > tags, just in case I think it's good to serve the images from a different domain with no cookies/auth attached to it, so that any executed script will not compromise users' data. That would be my first concern.
Of course if the user downloads the SVG and then opens it from the desktop and happens to open it with the browser, it might execute the potentially malicious load. So back to the original question, --export-plain-svg does remove scripting, but as I don't know of other SVG-specific vulnerabilites, I haven't checked for them.
Related
I'm having an issue in a .NET application where pages served by local IIS display random characters (mostly black diamonds with white question marks in them). This happens in Chrome, Firefox, and Edge. IE displays the pages correctly for some reason.
The same pages in production and in lower pre-prod environments work in all my browsers. This is strictly a local issue.
Here's what I've tried:
Deleted code and re-cloned (also tried switching branches)
Disabled all browser extensions
Ran in incognito mode
Rebooted (you never know)
Deleted temporary ASP.NET files
Looked for corrupt fonts on machine but didn't find any
Other Information:
Running IIS 10.0.17134.1
.NET MVC application with Knockout
I realize there are several other posts regarding black diamonds with question marks, but none of them seem to address my issue.
Please let me know if you need more information.
Thanks for your help!
You are in luck. The explicit purpose of � is to indicate that character encodings are being misused. When users see that, they'll know that we've messed up and lost some of their text data, and we'll know that, at one or more points, our processing and/or configuration is wrong.
(Fonts are not at issue [unless there as no font available to render �]. When there is no font available for a character, it's usually rendered as a white-filled rectangle.)
Character encoding fundamentals are simple: use a sufficient character set (say Unicode), pick an appropriate encoding (say UTF-8), encode text with it to obtain bytes, tell every program and person that gets the bytes that they represent text and which encoding is used. The encoding might be understood from a standard, convention, or specification.
Your editor does the actual encoding.
If the file is part of a project or similar system, a project file might store the intended encoding for all or each text file in the project. If your editor is an IDE, it should understand how the project does that.
Your compiler needs the know the encoding of each text file you give it. A project system would communicate what it knows.
HTML provides an optional way to communicate the encoding. Example: <meta charset="utf-8">. An HTML-aware editor should not allow this indicator to be different than the encoding it uses when saving the file. An HTML-aware editor might discover this indicator when opening the file and use the specified encoding to read the file.
HTTP uses another optional way: Content-Type response header. The web server emits this either statically or in conjunction with code that it runs, such as ASP.NET.
Web browsers use the HTTP way if given.
XHR (AJAX, etc) uses HTTP along with JavaScript processing. If needed the JavaScript processing should apply the HTTP and HTML rules, as appropriate. Note: If the content is JSON, the current RFC requires the encoding to be UTF-8.
No one or thing should have to guess.
Diagnostics
Which character encoding did you intend to use? This century, UTF-8 is so much the norm that if you choose to use a different one, you should have a good reason and document it (for others and your future self).
Compare the bytes in the file with the text you expect it to represent. Does it use the entended encoding? Use an editor or tool that shows bytes in hex.
As suggested by #snakecharmerb, what does the server send? Use a web browser's F12 network tab.
What does the HTTP response header say, if anything?
What does the HTML meta tag say, if anything?
What is the HTML doctype, if any?
I've found in a couple of places where normally only jpgs would be accepted, that users have uploaded gifs instead. What I think is going on, is that they're changing some of the code in the gifs to proxy a jpg and thus bypass systems which usually only take a jpg file. Is there a specific way to change a gif's header information to somehow imitate a jpgs?
I'm experimenting to see the logic behind this. I've noticed that among all the code found in gif files, they usually begin with a 'GIF89aoo'string. Anybody familiar with this?
I have extracted LaTeX content from .tex file, that I put on the website and choose SVG as output, because it provides the smallest possible size so I consider that It will be the best choise also because of his speed and widest versatility. I know, that .js files that contains configurations are cached on the disk in the browser for a few days (depends on the config of the web) or CDN file, but there could be problem with availability of that page, but what about SVG content?
Does it also cache on the disc?
Thank you
Yes and no, depending on what you mean.
For the SVG output, MathJax encodes its "fonts" as path data in JS files, see the code. These paths are cached like any other resource.
But the actual output is generated on the fly from these paths, so the individual equations will not be cached (because making MathJax aware of them would be difficult).
They are stable enough to be reused though, i.e. via local storage and you can generate SVG server side using MathJax-node.
I decided to start a project which is essentially a website. This website will be published through Github Pages.
My website will include an SVG file. The SVG file is generated by Graphviz from a DOT-file. The idea is that to modify the information displayed in the SVG, users can change the definition of the DOT-file, then Graphviz will re-generate the SVG, and the new SVG image will automatically be displayed once the web page is served.
However, I am left in the uncomfortable situation of requiring contributors who edit the DOT-file to run a script that calls Graphviz, and then commit changes to both the SVG and the DOT file.
If a contributor changes the DOT-file, but forgets to run the Graphviz script, then commits, the repository will contain a DOT-file and an SVG which are inconsistent with each other.
I can't not track versions of the DOT-file, because the SVG is gibberish - the DOT-file is the human-editable definition. I can't not track the SVG, because, how else will it stay up to date and available to Github Pages for consumption? And yet, with both of them tracked, I am essentially keeping track of changes in a redundant manner, and introducing opportunity for conflicts. It's a bit like versioning both your C code and the compiled .exe. Which is silly.
What's the best way of making sure that whenever the DOT-file is edited, the SVG will stay concurrent with it? Do I need to seriously rethink my strategy?
You might consider setting up a Jenkins instance to do this. Create a job that is triggered by a change in the dot file (using the git plugin). The job would execute the dot command and then commit the new svg file.
Generated files should not by committed to your repository.
By default, GitHub Pages uses Jekyll to create sites.
If you are using this workflow, then I would suggest taking a look at writing/using a Jekyll Generator plugin to dynamically create this SVG from your DOT-file.
I've added logic in Emacs to automatically call browse-url on a DMD generated html documentation file upon completion of a special build finish hook I've written.
In order for this to be usable I now want this call to only open a new browser tab the first time it is called and the rest of the times only reload the tab already showing the the doc file.
Is this possible, preferrably in Google Chrome?
I've scanned command line arguments for both GC and FF but have found nothing.
I suspect some Javascript/HTML-5 may do the trick but I know nothing about that.
For Firefox look into browse-url-firefox-new-window-is-tab and / or browse-url-maybe-new-window. You could follow the execution path from the definition of browse-url-default-browser, all in the browse-url.el.
But the basic idea is that you could just look at how, for example, browse-url-firefox is implemented, write the one that does exactly what you want (launches Firefox in the way that you need), and set it to be the browse-url-browser-function. The value of this variable must be a function which is called from browse-url.
What is interesting (perhaps something similar is available in Google Chrome), there's MozRepl, obviously, it will run in Mozilla browsers, and there's a binding for Emacs to talk to this REPL (interactive JavaScript interpreter). Using that you can have very fine-grained control over the behaviour of the browser, including, but not limited to creating new GUI components (using XUL), manipulating browser windows and so on. Would probably depend on how much time you are willing to spend on it.