Is there anyway to sanitize SVG file in c#, any libraries anything? - svg

Is there anyway to sanitize SVG file in c#, any libraries anything?
From client side we are sanitizing the SVG files while uploading , but the security team is asking for a sanitization in serverside too.

I'm primarily a Python developer, but I thought I'd throw some research into the issue for ya. I used to develop for C, so I thought I should at least have a basic understanding of what's going on.
*.SVG files are structured like XML documents, and use the HTML DOM to access JavaScript and CSS functionalities. Trying to enumerate and script out every single catch for potential JavaScript-based security issues doesn't seem realistic, so personally, I'd just entirely remove all JavaScript sectors that do anything more than define simple variables, do math operations, or reference already-defined visual elements from any uploaded *.SVG files.
Since *.SVG files are based on XML and are human-readable, this could be accomplished by iterating through the file either line-by-line like a text file or element-by-element like an XML or HTML file. You'd want to go through and remove all the JavaScript scripts that don't meet the above criteria, save it & then convert it to XML and use a standard XML-sanitation library on it, and then convert that back to *.SVG. I reckon this Github library and this StackOverflow thread could be helpful in that.
I hope my response was helpful!

It is true what your security team say: client-side security is not security. It is just user convenience. Never rely on client-side checks. Anyone wanting to do bad things to your application will bypass client-side checks first thing.
Now, a SVG file can be used in a XSS attack only by leveraging the <script> tag.
Unfortunately, defusing/securing a script is a very complicated topic and prone to errors and both false positives and negatives.
So, I believe your only recourse is to remove scripts altogether. This might not be what you need.
But, if it is, then it's very simple to do. The script tag cannot be masqueraded inside the SVG, or the browser will not recognize it in the first place, making the attack moot. So a simple regex should suffice. Something like,
cleanSVGcode = Regex.Replace(
userSVGcode,
#"<script.*?script>",
#"",
RegexOptions.IgnoreCase|RegexOptions.SingleLine
);
It is possible to sanitize out further sequences. Since, if they're written incorrectly or in an obfuscated way, javascript calls won't work, the number of these sequences is limited.
#"javascript:" => #"syntax:error:"

Related

Any way in Expression Engine to simulate Wordpress' shortcode functionality?

I'm relatively new to Expression Engine, and as I'm learning it I am seeing some stuff missing that WordPress has had for a while. A big one for me is shortcodes, since I will use these to allow CMS users to place more complex content in place with their other content.
I'm not seeing any real equivalent to this in EE, apart from a forthcoming plugin that's in private beta.
As an initial test I'm attempting to fake shortcodes by using delimited strings (e.g. #foo#) in the content field, then using a regex to pull those out and pass them to a function that can retrieve the content out of EE's database.
This brings me to a second question, which is that in looking at EE's API docs, there doesn't appear to be a simple means of retrieving the channel entries programmatically (thinking of something akin to WP's built-in get_posts function).
So my questions are:
a) Can this be done?
b) If so, is my method of approaching it reasonable? Or is there something stupidly obvious I'm missing in my approach?
To reiterate, my main objective here is to have some means of allowing people managing content to drop a code in place in their content that will be replaced with channel content.
Thanks for any advice or help you can give me.
Here's a simple example of the functionality you're looking for.
1) Start by installing Low Replace.
2) Create two Global Variables called gv_hello and gv_goodbye with the values "Hello" and "Goodbye" respectively.
3) Put this text into the body of an entry:
[say_hello]
Nice to see you.
[say_goodbye]
4) Put this into your template, wrapping the Low Replace tag around your body field.
{exp:low_replace
find="[say_hello]|[say_goodbye]"
replace="{gv_hello}|{gv_goodbye}"
multiple="yes"
}
{body}
{/exp:low_replace}
5) It should output this into your browser:
Hello
Nice to see you.
Goodbye
Obviously, this is a really simple example. You can put full blown HTML into your global variable. For example, we've used that to render a complex, interactive graphic that isn't editable but can be easily dropped into a page by any editor.
Unfortunately, due to parse order issues, EE tags won't work inside Global Variables. If you need EE tags in your short code output, you'll need to use Low Variables addon instead of Global Variables.
Continued from the comment:
Do you have examples of the kind of shortcodes you want to support/include? Because i have doubts if controlling the page-layout from a text-field or wysiwyg-field is the way to go.
If you want editors to be able to adjust layout or show/hide extra parts on the page, giving them access to some extra fields in the channel, is (imo) much more manageable and future-proof. For instance some selectfields, a relationship (or playa) field, or a matrix, to let them choose which parts to include/exclude on a page, or which entry from another channel to pull content from.
As said in the comment: i totally understand if you want to replace some #foo# tags with images or data from another field (see other answers: nsm-transplant, low_replace). But, giving an editor access to shortcodes and picking them out, is like writing a template-engine to generate ee-template code for the ee-template-engine.
Using some custom fields to let editors pick and choose parts to embed is, i think, much more manageable.
That being said, you could make a plugin to parse the shortcodes from a textareas content, and then program a lot, to fetch data from other modules you want to support. For channel entries you could build out of the channel data library by objectiveHTML. https://github.com/objectivehtml/Channel-Data
I hear you, I too miss shortcodes from WP -- though the reason they work so easily there is the ubiquity of the_content(). With the great flexibility of EE comes fewer blanket solutions.
I'd suggest looking at NSM Transplant. It should fit the bill for you.
There is also a plugin called Shortcode, which you can find here at
Devot-ee
A quote from the page:
Shortcode aims to allow for more dynamic use of content by authors and
editors, allowing for injection of reusable bits of content or even
whole pieces of functionality into any field in EE

Website Input/Output using a premade .exe (Need direction)

I am trying to build a part of a website which takes in a text passage as input, and outputs the same text passage, except with the definition of each word appearing when the user rolls over (or clicks) any given word. I have a pre-made .exe file which maps input words to their definitions (takes in words from standard input and outputs the definition to standard output).
My problem, then, is to run user input through the .exe file on the website's server, then put the output back onto the page. It seems like a fairly trivial problem, but I have no idea where to start.
So my questions are: is this project even possible? If so, what languages/tools do I need to be able to use in order to implement it? Are there keywords that describe what I'm talking about that I could use to look up tutorials/solutions on the Web?
I have rudimentary knowledge of PHP, HTML, and Javascript, but so little experience that I can't judge whether (and how) they can be used to approach this problem.
Note: I do not have access to the .exe source, so I must use the .exe itself as my input-output mechanism.
With AJAX and PHP, you can do accomplish this with minimal effort.
JavaScript's AJAX features would send the word you input to the PHP page, and from the PHP page, you can run the external exe file with the sent word as an argument (sanitize it, please. People can inject code which will explode your servers!):
<?php
$word = $_POST['input_word']; // MAKE SURE YOU SANITIZE THIS. If you don't, system security goes down the toilet.
exec('myprogram.exe ' + $word, $output_array);
print_r($output_array);
?>

Markdown to HTML conversion

I'm still in the middle of coding my final year project at university, and I have come across an issue where I need to either convert from HTML to Markdown or visa versa. Now I have no experience whatsoever of Perl, Python, etc. so I'm in need of an easy-to-implement solution, I only have about 6 weeks left to complete this now. I'm writing the data from a WMD text box to SQL Server, and I can either upload it as Markdown or HTML but if that data needs editing it cannot be in HTML as this would be too confusing for the end user who is perceived to have zero/very little computing "know how".
What should I do?
Karmastan's answer is probably the best here. Keeping the raw Markdown in the database is a really good solution as it allows users to upkeep the content in a form with which they're familiar.
However, if you have a bunch of HTML which is already converted, you might want to look at something like Markdownify: The HTML to Markdown converter for PHP.
Edit: based on what you've said below, there are a few things you should keep in mind:
Make sure that the following is set in wmd.js:
wmd_options = {"output": "Markdown"};
This ensures that you're storing Markdown in the database.
Source: How do you store the markdown using WMD in ASP.NET?
When outputting the Markdown to the web, you need to transform it to HTML. To do this, you'll need a library which does Markdown -> HTML conversion. Here are two examples:
Announcing Markdown.NET
Revisied Markdown.NET Library
I'm not a .NET developer, so I can't really help with how these libraries should be used, but hopefully the documentation will make that clear.
If you look at the web site for Markdown, you'll find a Perl script that converts Markdown-syntax documents to HTML. Keep Markdown text in your database and invoke the script whenever you need to display the text. No Perl knowledge required!

How to ensure website security checks

How to safe gaurd a form against script injection attacks. This is one of the most used form of attacks in which attacker attempts to inject a JS script through form field. The validation for this case must check for special characters in the form fields. Look for
suggestions, recommedations at internet/jquery etc for permissible characters &
character masking validation JS codes.
You can use the HTML Purifier (in case you are under PHP or you might have other options for the language you are under) to avoid XSS (cross-site-scripting) attacks to great level but remember no solution is perfect or 100% reliable. This should help you and always remember server-side validation is always best rather than relying on javascript which bad guys can bypass easily disabling javascript.
For SQL Injection, you need to escape invalid characters from queries that can be used to manipulate or inject your queries and use type-casting for all your values that you want to insert into the database.
See the Security Guide for more security risks and how to avoid them. Note that even if you are not using PHP, the basic ideas for the security are same and this should get you in a better position about security considerations.
If you output user controlled input in html context then you could follow what others and sanitize when processing input (html purify, custom input validation) and/or html encode the values before output.
Cases when htmlencodng/strip tags (no tags needed) is not sufficient:
user input appears in attributes then it depends on whether you always (double) quote attributes or not (bad)
used in on* handlers (such as onload="..), then html encoding is not sufficient since the javascript parser is called after html decode.
appears in javascript section - depends on whether this is in quoted (htmlentity encode not sufficient) or unquoted region (very bad).
is returned as json which may be eval'ed. javascript escape required.
appears in CSS - css escape is different and css allows javascript (expression)
Also, these do not account for browser flaws such as incomplete UTF-8 sequence exploit, content-type sniffing exploits (UTF-7 flaw), etc.
Of course you also have to treat data to protect against other attacks (SQL or command injection).
Probably the best reference for this is at the OWASP XSS Prevention Cheat Sheet
ASP.NET has a feature called Request Validation that will prevent unencoded HTML from being processed by the server. For extra protection, one can use the AntiXSS library.
you can prevent script injection by encoding html content like
Server.HtmlEncode(input)
There is the OWASP EASPI too.

Will HTML Encoding prevent all kinds of XSS attacks?

I am not concerned about other kinds of attacks. Just want to know whether HTML Encode can prevent all kinds of XSS attacks.
Is there some way to do an XSS attack even if HTML Encode is used?
No.
Putting aside the subject of allowing some tags (not really the point of the question), HtmlEncode simply does NOT cover all XSS attacks.
For instance, consider server-generated client-side javascript - the server dynamically outputs htmlencoded values directly into the client-side javascript, htmlencode will not stop injected script from executing.
Next, consider the following pseudocode:
<input value=<%= HtmlEncode(somevar) %> id=textbox>
Now, in case its not immediately obvious, if somevar (sent by the user, of course) is set for example to
a onclick=alert(document.cookie)
the resulting output is
<input value=a onclick=alert(document.cookie) id=textbox>
which would clearly work. Obviously, this can be (almost) any other script... and HtmlEncode would not help much.
There are a few additional vectors to be considered... including the third flavor of XSS, called DOM-based XSS (wherein the malicious script is generated dynamically on the client, e.g. based on # values).
Also don't forget about UTF-7 type attacks - where the attack looks like
+ADw-script+AD4-alert(document.cookie)+ADw-/script+AD4-
Nothing much to encode there...
The solution, of course (in addition to proper and restrictive white-list input validation), is to perform context-sensitive encoding: HtmlEncoding is great IF you're output context IS HTML, or maybe you need JavaScriptEncoding, or VBScriptEncoding, or AttributeValueEncoding, or... etc.
If you're using MS ASP.NET, you can use their Anti-XSS Library, which provides all of the necessary context-encoding methods.
Note that all encoding should not be restricted to user input, but also stored values from the database, text files, etc.
Oh, and don't forget to explicitly set the charset, both in the HTTP header AND the META tag, otherwise you'll still have UTF-7 vulnerabilities...
Some more information, and a pretty definitive list (constantly updated), check out RSnake's Cheat Sheet: http://ha.ckers.org/xss.html
If you systematically encode all user input before displaying then yes, you are safe you are still not 100 % safe.
(See #Avid's post for more details)
In addition problems arise when you need to let some tags go unencoded so that you allow users to post images or bold text or any feature that requires user's input be processed as (or converted to) un-encoded markup.
You will have to set up a decision making system to decide which tags are allowed and which are not, and it is always possible that someone will figure out a way to let a non allowed tag to pass through.
It helps if you follow Joel's advice of Making Wrong Code Look Wrong or if your language helps you by warning/not compiling when you are outputting unprocessed user data (static-typing).
If you encode everything it will. (depending on your platform and the implementation of htmlencode) But any usefull web application is so complex that it's easy to forget to check every part of it. Or maybe a 3rd party component isn't safe. Or maybe some code path that you though did encoding didn't do it so you forgot it somewhere else.
So you might want to check things on the input side too. And you might want to check stuff you read from the database.
As mentioned by everyone else, you're safe as long as you encode all user input before displaying it. This includes all request parameters and data retrieved from the database that can be changed by user input.
As mentioned by Pat you'll sometimes want to display some tags, just not all tags. One common way to do this is to use a markup language like Textile, Markdown, or BBCode. However, even markup languages can be vulnerable to XSS, just be aware.
# Markup example
[foo](javascript:alert\('bar'\);)
If you do decide to let "safe" tags through I would recommend finding some existing library to parse & sanitize your code before output. There are a lot of XSS vectors out there that you would have to detect before your sanitizer is fairly safe.
I second metavida's advice to find a third-party library to handle output filtering. Neutralizing HTML characters is a good approach to stopping XSS attacks. However, the code you use to transform metacharacters can be vulnerable to evasion attacks; for instance, if it doesn't properly handle Unicode and internationalization.
A classic simple mistake homebrew output filters make is to catch only < and >, but miss things like ", which can break user-controlled output out into the attribute space of an HTML tag, where Javascript can be attached to the DOM.
No, just encoding common HTML tokens DOES NOT completely protect your site from XSS attacks. See, for example, this XSS vulnerability found in google.com:
http://www.securiteam.com/securitynews/6Z00L0AEUE.html
The important thing about this type of vulnerability is that the attacker is able to encode his XSS payload using UTF-7, and if you haven't specified a different character encoding on your page, a user's browser could interpret the UTF-7 payload and execute the attack script.
One other thing you need to check is where your input comes from. You can use the referrer string (most of the time) to check that it's from your own page, but putting in a hidden random number or something in your form and then checking it (with a session set variable maybe) also helps knowing that the input is coming from your own site and not some phishing site.
I'd like to suggest HTML Purifier (http://htmlpurifier.org/) It doesn't just filter the html, it basically tokenizes and re-compiles it. It is truly industrial-strength.
It has the additional benefit of allowing you to ensure valid html/xhtml output.
Also n'thing textile, its a great tool and I use it all the time, but I'd run it though html purifier too.
I don't think you understood what I meant re tokens. HTML Purifier doesn't just 'filter', it actually reconstructs the html. http://htmlpurifier.org/comparison.html
I don't believe so. Html Encode converts all functional characters (characters which could be interpreted by the browser as code) in to entity references which cannot be parsed by the browser and thus, cannot be executed.
<script/>
There is no way that the above can be executed by the browser.
**Unless their is a bug in the browser ofcourse.*
myString.replace(/<[^>]*>?/gm, '');
I use it, then successfully.
Strip HTML from Text JavaScript

Resources