Keep SVGs from Being Accessed by User

Keep SVGs from Being Accessed by User - security

I'm putting together a mobile version of a webpage which consists entirely of client art. For the old-fashioned desktop version, I just used PNGs, but I really wanted to use SVG for mobile. SVGZ would be smaller and resolution independent, so it seemed like a perfect use case.
But the client is worried that, once his art is online in SVG, anyone could download the files and use his art illegally (he's had stuff he worked on pirated before, so he takes this pretty seriously.) This had never occurred to me until he brought it up, but the SVG would basically be his original source art.
I was wondering if there's any way to prevent the SVG files from being accessed by the user. As far I know this is impossible -- making the files available to the user-agent means making them available to the user -- but I wanted to ask around to be sure.
Thanks for any help.

No, this is impossible. If a web browser can request the files for display, then any computer anywhere can request the files and save the direct results.
Serving up intentionally degraded artwork (e.g. rasterization) is the only way to prevent people from having the originals. Of course, a determined thief could still re-trace the PNG and get a vectorized, resolution-independent close approximation of the original.
Your client could alternatively:
Include copyright comments in the source, proving ownership. (Yes, a thief could delete these.)
Include 'hidden' elements (0% opacity or placed under another item), proving ownership. (Yes, a thief could delete these.)
Use data steganography in the source SVG to watermark it (e.g. vary the decimal values in a path in a manner minor enough to not effect the result, but still embed custom data). (Yes, any thief suspecting this could lower decimal precision or transform all values in a manner that might remove this.)
Trust in the law to protect his works, or provide a recourse if they are stolen.
Trust in the goodness of most of mankind to not do this.
Decide that theft is the sincerest form of flattery, and not worry about it. :)

Related

Prevent SVG from being easily reverse-engineered

I just started at a new company and have been asked to do what I believe impossible, but I need confirmation on this. The website that my company has allows users to view images in extremely high resolution using SVGs. We've built a custom viewer for these images that allow you to zoom and scroll around the image and it works well. Because of this, the image format "needs" to remain SVG. However, we need to include a watermark to this image. The way we're doing this right now is passing the SVG and PNG watermark to the client, inserting the watermark into the SVG, and displaying it to the user. This is very obviously easily hacked as anyone with client-side experience knows (The dev team here is mostly older developers and don't have much web experience). Even if the raw SVG isn't intercepted, they can still just delete the watermark from the source. I've convinced my boss to make the watermark be injected on the server-side, so this leaves just the problem that SVGs are editable by the end-client.
What I want to know is if one of these two things are possible:
1) Is there another image format akin to SVG that could be used to keep this highly scalable image without loosing resolution, and without it being directly editable by an end-client? The only options that people seem to discuss for the web is JPG, GIF, PNG, and SVG. I've looked at Adobe Illustrator ".ai" files and EPS (Encapsulated PostScript) as other vector options, but I can't find anywhere if I can modify these images in PHP, which is key if I want to overlay a watermark in server-side code.
2) Is there a way to obfuscate the raw SVGs content so that the use can't go and manipulate it? I've seen SVGs that have <image ...> tags inside them with PNGs represented as long and complex strings before. Something like xlink:href="data:image/png;base64...". I was wondering if there's a way to display an SVG as this string, so that the data can't be directly manipulated. I'm sure there's an algorithm to reverse these, but so long as we're sticking with SVGs, I need to make this as secure as possible with as many hoops to jump through as possible if someone wants to steal the data.
Either way would be acceptable, as long as removing this watermark is more complex than just hitting F12 and removing the element inside the dev tools.

Is there another image format akin to SVG that could be used to keep this highly scalable image without loosing resolution, and without it being directly editable by an end-client?
SVG is a vector format, and to maintain extreme scalability, you need to stick with vectors, be it any format. However, in that case it will always be possible to remove the vectors that belong to your watermark. There are of course vector formats that are stored as binary, which would make it somewhat harder for an enduser to parse and edit, but those are not editable in PHP either, and are much less compatible. So you probably don't want to do this.
Is there a way to obfuscate the raw SVGs content so that the user can't go and manipulate it?
First, it will never be "secure" in the sense that as said above, it will always be possible to remove the watermark from a vector image. (Btw the only difference to bitmap formats like JPG is that content below the watermark in bitmap is actually missing, while in SVG it's still there.)
However, depending on how "good" you want this to be, you can do a few things. I think the "goodness" here means the effort needed to remove the watermark, and you can raise the bar relatively easily. You don't have to (and you probably can't reasonably) obfuscate the whole SVG.
One thing that comes to mind is that SVG is basically just XML, it consists of tags like <rect>, <line>, <circle> etc. The order of these tags doesn't matter much (mostly, with exceptions of course). So you could entwine the tags that draw your watermark pretty much randomly among existing tags. I mean really randomly, so different downloads would produce different results. If you do this well (eg. you find line tags to "hide" your watermark line tags, and so on), it will be hard to automatically remove the watermark, because it's all over relevant data that is your actual image. Of course, the watermark could still be visually in a corner, and this is already a weakness, anything drawn to a corner could possibly be removed automatically. And it will be easy to remove by hand with any decent editor I suppose. So it depends on what the purpose is.
But I still think this sort of thing could make it hard enough in many scenarios (and would be totally inadequate in others).

Unchangeable EXIF datas

Do you know if there are unchangeable EXIF datas ?
In my case i want to know the real date of creation of a jpeg image. So I thought the EXIF's datas was the best way but I realized that with a software like XnView you can change it. So there is any way i can now the real date of the creation of an image ?
In another hand, is it possible to know if a EXIF datas has been modified ?
Thx fo all,
And sorry for my bad english
Have a good day !
:)

In principle, it is not possible to be sure the data hasn't been edited, although it may take a great deal of skill to do so indetectably. Some of the major camera makers (Canon and Nikon, possibly others) offer an "image authentication" feature in their pro model cameras which is designed to make it impossible to modify the image after it has been taken. They do this for the benefit of people doing legal work - evidence shots and the like. To use this, you have to switch it on (via the camera settings) before you take the picture. Even with these though, it is still possible to alter the data: both the Canon and Nikon authentication systems have been cracked (presumably with considerable difficulty).
As for normal pictures, yes, these are very easy to alter. However many (most?) programs which can edit EXIF data leave their own signs. For example, Adobe Photoshop always adds its own name somewhere in the EXIF, apparently whether you want it to or not. You can see this with many different EXIF viewers, especially with the more advanced ones like PhotoME. (Which, sadly, is no longer maintained.)
Short answer: yes, it is always possible to exit EXIF, and almost always possible to do it indetectably, but it may requite the right tools and quite a lot of skill. You can't ever be certain it has not been done.

serve up mp3 to local player w/o showing location of mp3

I am in the process of upgrading an existing application that was written in flash to play mp3 files of phone calls. The purpose of the application is to train employees of how to work with customers. Some of the calls are "negative" calls and those are used to train employees of what NOT to do.
The reason I need to not provide a location of where the mp3s are, is that if someone were to become disgruntled and leave the company and decide to take some of the negative calls with them, that would be bad. I don't ever like to underestimate the intelligence of our users so I'm sure some could figure out a way to get them regardless.
The current implementation as I said was written in flash and it loads up all of the mp3s as the swf file loads on the client thereby mitigating the necessity to ever make a call up to the server to grab a new mp3 file. None of these mp3s are huge in file size because they're all only about 30 second phone call clips.
Are there any ways to prevent a direct download of an mp3 from an IIS server. Could I serve them up with c# as an aspx file that requires a specific hash or salt in order to play?
I really dont' want to have to have them all brought into a swf like the current implementation if I can avoid it.
any suggestions welcome.
TIA

Honestly, if a user is that determined to get the data, they will. I believe the balance here is at what point will said hypothetical employee feel the gain to be had by obtaining the data is not worth the effort to get it. And how much effort you have to go through vs. what it is worth to the company.
If the audio will always be played back on your application, one simple layer of security would be to encrypt the files. Keeping it simple, you can use a symmetric key, store it in the application, and decrypt the file in memory before it is played (this way it's not stored in a temporary file the user could just grab). Sure a user with 3/4 of a brain could probably fish the key out of the executable, but frankly the sound is playing on their speakers and I'm sure they have a smartphone. They could just as easily record the output with Sound Recorder as it plays too.
Simply speaking, I believe a very minimum layer of technological security mixed with a binding confidentiality agreement should give you enough recourse. The security will keep the would-be-honest honest and deter the lazy, as well as giving you a leg up in proving the employee obtained the audio through nefarious means (i.e. it wasn't just "available for the taking").

Images with unknown content: Dangerous for a browser?

Let's say I allow users to link to any images they like. The link would be checked for syntactical correctness, escaping etc., and then inserted in an <img src="..."/> tag.
Are there any known security vulnerabilities, e.g. by someone linking to "evil.example.com/evil.jpg", and evil.jpg contains some code that will be executed due to a browser bug or something like that?
(Let's ignore CSRF attacks - it must suffice that I will only allow URLs with typical image file suffixes.)

Security risks in image files crop up from time to time. Here's an example: https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-22_11-5388621.html?tag=nl.e019. It's an old article, so obviously these things have been rolling around for a while.
While it's impossible to say for sure that something is always safe/never safe, so far it sounds like the risks have been relatively low, and are patched by the image viewer manufacturers pretty quickly. IMO the best test is how often you hear about actual problems occurring. This threat vector has been a known possibility for years, but hasn't really become widespread. Given the extent to which people link images in public forums, I'd expect it to become a big problem pretty quickly, if it was a realistic sort of attack.

There was a JPEG buffer overrun some time ago. Also, you have to account for images who actually contain code, so that you don't execute the code.

Yes, this could be a problem. There is quite a few exploits known that work by using vulnerabilities in the image rendering code of the browser or the OS. Including remote execution vulnerabilities. It might not be the easiest flaw to exploit, but it is definitly a concern.
An example of such a vulnerability : http://www.securityfocus.com/bid/14282/discuss (but you could find tons of other vulnerabilities of the same type).
I think I remember such a problem with a high visibility site having exactly this kind of vulnerability exploited. An advertisement image was displayed from some third party ad provider, and the image had not been checked. 1000's of users compromised ... Cant find the story anymore ... sorry.

So there's potential buffer overflows with handling untrusted data, however you get to it. Also if you insert untrusted data in the form of URLs into your page then there is a risk of XSS flaws. However, I guess you want to know why browsers flag it as a problem:
Off the top of my head:
I believe referrer information is still sent in this case. Even if you don't ever and wont ever use URL rewriting for sessions, you are still exposing information that was previously confidential. chris_l: True, I just did a quick test - the browser (FF 3.5) sends the Referer header.
You cannot be sure that the image data returned will not be misleading. Wrong text on buttons, for instance. Or in bigger images, spoofing instructions, say.
Image size may change layout. Image loading could even be delayed to move page at a critical time. chris_l: Good point! I should always set width and height (could be determined by the server when the user posts the image - would work as long as the image doesn't change... better ideas?)
Images can be used for AJAX-like functionality. chris_l: Please expand on this point - how does that work?
Browsers will flag an issue with your site, hiding other problems and conditioning users to accept slack security practices. chris_l: That's definitely an important problem, when the site uses HTTPS.

UPDATE: Proofed wrong!
You also have to consider, that cookies (that means sessionIDs) are also sent to the server where the image is located. So another server gets your sessionID. If the image actually contains PHP-Code it could steal the sessionID:
For example you include:
<img src="http://example.com/somepic.jpg alt="" />
On the server of http://example.com there's a .htaccess-file saying the following:
RewriteRule ^somepic\.jpg$ evilscript.php
then the pic actually is a php-file, generating the image, but also do some evil stuff, like session-stealing or whatever...

how can I protect scraping of certain data on my web pages? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I want to protect only certain numbers that are displayed after each request. There are about 30 such numbers. I was planning to have images generated in the place of those numerbers, but if the image is not warped as with captcha, wont scripts be able to decipher the number anyway? Also, how much of a performance hit would loading images be vs text?

The only way to make sure bad-guys don't get your data is not to share it with anyone. Any other solution is essentially entering an arms race with the screen-scrapers. At one point or another, one of you will find the arms-race too costly to continue. If the data you are sharing has any perceptible value, then probably the screen-scrapers will be very determined.

It's not possible.
You use javascript and encrypt the page, using document.write() calls after decrypting. I either scrape from the browser's display or feed the page through a JS engine to get the output.
You use Flash. I can poke into the flash file and get the values. You encrypt them in the flash and I can just run it then grab the output from the interpreter's display as a sequence of images.
You use images and I can just feed them through an OCR.
You're in an arms race. What you need to do is make your information so useful and your pages so easy to use that you become the authority source. It's also handy to change your output formats regularly to keep up, but screen scrapers can handle that unless you make fairly radical changes. Radical changes drive users away because the page is continually unfamiliar to them.
Your image solution wont' help much, and images are far less efficient. A number is usually only a few bytes long in HTML encoding. Images start at a few hundred bytes and expand to a 1k or more depending on how large you want. Images also will not render in the font the user has selected for their browser window, and are useless to people who use assisted computing devices (visually impaired people).

Apart from the images, you could display the numbers using JavaScript or flash.
You could also use CSS to position individual digits using various combinations of absolute or relative positions.
You could also use JavaScript to help you create these DIV.
The point is just to obfuscate enough that it becomes really hard.
One more solution is to use images of segments or single dots and re-construct the images of the digits using CSS, a bit like a dot-matrix display.
You could litter the source of the page with these absolutely positioned DIVs and again make it more difficult to reconstruct by creating them dynamically.
At any rate, you can't stop a determined scraper from getting to the data: it doesn't take a lot to automate a web browser and take screenshots that can be fed to an OCR.
There is nothing anyone from paying someone pennies to get the data manually anyway.
The point is: how determined are your opponents (user?).
It's a bit like the software protection business: making things hard enough that you would deter casual 'pirates' is not too hard, and it's a fairly good approach in general.
However, if there is much value in the data you present, there is nothing you can really do to protect it.
All you can do it make it hard enough so that casual 'thieves' will prefer to continue paying for your services rather than circumvent it.

Javascript would probably be the easiest to implement, but you could get really creative and have large blocks of numbers with certain ones being viewable by placing layers on top of the invalid numbers, blending the wrong numbers into the background, or making them invisible via css and semi-randomly generated class names.

I can't believe I'm promoting a common malware scripting tactic, but...
You could encode the numbers as encoded Javascript that gets rendered at runtime.

Generate an image containing those numbers and display the image. :-)

I think you guys are being too reactive with these solutions. Javascript, Capcha, even litigation and the DMCA process don't address the complex adaptive nature of web scraping and data theft. Don't you think the "ideal" solution to prevent malicious bots and website scraping would be something working in a real-time proactive mitigation strategy? Very similar to a Content Protection Network. Just say'n.
Examples:
IBM - IBM ISS Data Security Services
DISTIL - www.distil.it

Can you provide a little more detail on what it is you're doing? Certainly there's a performance hit to create an image instead of dumping out the text of a number, but how often would you be doing this per day?
Using JavaScript is the same as using text. It's trivial to reverse engineer.

Use animated numbers using flash. It may not be fool proof but it would make it harder to crack.

What about posting a lot of dummy numbers and showing the right ones with external CSS? Just as long the scraper doesn't start to parse the external CSS.

Don't output the numbers, i.e. prefix
echo $secretNumber;
with //.

For all those that recommend using Javascript, or CSS to obfuscate the numbers, well there's probably a way around it. Firefox has a plugin called abduction. Basically what it does is saves the page to a file as an image. You could probably modify this plugin to save the image, and then analyze the image to find out the secret number that is trying to be hidden.
Basically, if there's enough incentive behind scraping these numbers from the page, then it will be done. Otherwise, just post a regular number, and make it easier on your users so they won't have to worry so much about not being able to copy and paste the number, or other such problems the result from this trickery.

just do something unexpected and weird (different every time) w/ CSS box model. Force them to actually use a browser backed screenscraper.

I don't think this is possible, you can make their job harder (use images as some suggested here) but this is all you can do, you can't stop a determined person from getting the data, if you don't want them to scrape your data, don't publish it, as simple as that ...

Assuming these numbers are updated often (if they aren't then protecting them is completely moot as a human can just transcribe them by hand) you can limit automated scraping via throttling. An automated script would have to hit your site often to check for updates, if you can limit these checks you win, without resorting to obfuscation.
For pointers on throttling see this question.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string