I am not concerned about other kinds of attacks. Just want to know whether HTML Encode can prevent all kinds of XSS attacks.
Is there some way to do an XSS attack even if HTML Encode is used?
No.
Putting aside the subject of allowing some tags (not really the point of the question), HtmlEncode simply does NOT cover all XSS attacks.
For instance, consider server-generated client-side javascript - the server dynamically outputs htmlencoded values directly into the client-side javascript, htmlencode will not stop injected script from executing.
Next, consider the following pseudocode:
<input value=<%= HtmlEncode(somevar) %> id=textbox>
Now, in case its not immediately obvious, if somevar (sent by the user, of course) is set for example to
a onclick=alert(document.cookie)
the resulting output is
<input value=a onclick=alert(document.cookie) id=textbox>
which would clearly work. Obviously, this can be (almost) any other script... and HtmlEncode would not help much.
There are a few additional vectors to be considered... including the third flavor of XSS, called DOM-based XSS (wherein the malicious script is generated dynamically on the client, e.g. based on # values).
Also don't forget about UTF-7 type attacks - where the attack looks like
+ADw-script+AD4-alert(document.cookie)+ADw-/script+AD4-
Nothing much to encode there...
The solution, of course (in addition to proper and restrictive white-list input validation), is to perform context-sensitive encoding: HtmlEncoding is great IF you're output context IS HTML, or maybe you need JavaScriptEncoding, or VBScriptEncoding, or AttributeValueEncoding, or... etc.
If you're using MS ASP.NET, you can use their Anti-XSS Library, which provides all of the necessary context-encoding methods.
Note that all encoding should not be restricted to user input, but also stored values from the database, text files, etc.
Oh, and don't forget to explicitly set the charset, both in the HTTP header AND the META tag, otherwise you'll still have UTF-7 vulnerabilities...
Some more information, and a pretty definitive list (constantly updated), check out RSnake's Cheat Sheet: http://ha.ckers.org/xss.html
If you systematically encode all user input before displaying then yes, you are safe you are still not 100 % safe.
(See #Avid's post for more details)
In addition problems arise when you need to let some tags go unencoded so that you allow users to post images or bold text or any feature that requires user's input be processed as (or converted to) un-encoded markup.
You will have to set up a decision making system to decide which tags are allowed and which are not, and it is always possible that someone will figure out a way to let a non allowed tag to pass through.
It helps if you follow Joel's advice of Making Wrong Code Look Wrong or if your language helps you by warning/not compiling when you are outputting unprocessed user data (static-typing).
If you encode everything it will. (depending on your platform and the implementation of htmlencode) But any usefull web application is so complex that it's easy to forget to check every part of it. Or maybe a 3rd party component isn't safe. Or maybe some code path that you though did encoding didn't do it so you forgot it somewhere else.
So you might want to check things on the input side too. And you might want to check stuff you read from the database.
As mentioned by everyone else, you're safe as long as you encode all user input before displaying it. This includes all request parameters and data retrieved from the database that can be changed by user input.
As mentioned by Pat you'll sometimes want to display some tags, just not all tags. One common way to do this is to use a markup language like Textile, Markdown, or BBCode. However, even markup languages can be vulnerable to XSS, just be aware.
# Markup example
[foo](javascript:alert\('bar'\);)
If you do decide to let "safe" tags through I would recommend finding some existing library to parse & sanitize your code before output. There are a lot of XSS vectors out there that you would have to detect before your sanitizer is fairly safe.
I second metavida's advice to find a third-party library to handle output filtering. Neutralizing HTML characters is a good approach to stopping XSS attacks. However, the code you use to transform metacharacters can be vulnerable to evasion attacks; for instance, if it doesn't properly handle Unicode and internationalization.
A classic simple mistake homebrew output filters make is to catch only < and >, but miss things like ", which can break user-controlled output out into the attribute space of an HTML tag, where Javascript can be attached to the DOM.
No, just encoding common HTML tokens DOES NOT completely protect your site from XSS attacks. See, for example, this XSS vulnerability found in google.com:
http://www.securiteam.com/securitynews/6Z00L0AEUE.html
The important thing about this type of vulnerability is that the attacker is able to encode his XSS payload using UTF-7, and if you haven't specified a different character encoding on your page, a user's browser could interpret the UTF-7 payload and execute the attack script.
One other thing you need to check is where your input comes from. You can use the referrer string (most of the time) to check that it's from your own page, but putting in a hidden random number or something in your form and then checking it (with a session set variable maybe) also helps knowing that the input is coming from your own site and not some phishing site.
I'd like to suggest HTML Purifier (http://htmlpurifier.org/) It doesn't just filter the html, it basically tokenizes and re-compiles it. It is truly industrial-strength.
It has the additional benefit of allowing you to ensure valid html/xhtml output.
Also n'thing textile, its a great tool and I use it all the time, but I'd run it though html purifier too.
I don't think you understood what I meant re tokens. HTML Purifier doesn't just 'filter', it actually reconstructs the html. http://htmlpurifier.org/comparison.html
I don't believe so. Html Encode converts all functional characters (characters which could be interpreted by the browser as code) in to entity references which cannot be parsed by the browser and thus, cannot be executed.
<script/>
There is no way that the above can be executed by the browser.
**Unless their is a bug in the browser ofcourse.*
myString.replace(/<[^>]*>?/gm, '');
I use it, then successfully.
Strip HTML from Text JavaScript
Related
If a web application allows users to contribute translation messages in order to localize the application to a given language or locale, then what are the potential security risks involved in this. [Apart from social engineering which is an obvious one]
These translation messages are usually a collection of key-value pairs in some kind of format depending on the language/library etc. For example, PHP array files as in many OSS PHP applications, getetxt .po files for apps using gettext, Yaml files in Rails, and many others.
Such translation data is then used to provide a new locale in the list of locales available for a site.
As soon as you relinquish control of the content, you are effectively allowing any "authorized" content provider to add whatever they want to your UI. Even if you prevent execution of potential code included in the content, you cannot prevent display of inappropriate text (or images) to users unless you screen that text at its entry point into your system.
One way to address this is via service contracts with the content providers that specify their obligations for content verification. Depending on who the providers are, this may be enough to make you confortable with relinquishing control. Otherwise, there's pretty much no substitute for a human with the application's owner organization approving all submitted content before it is approved for publication.
To be honest this is kind of a strange question. I will assume that you have read and understand the OWASP top 10. I assume you know how to protect your own server from attack.
That being said in my mind the most obvious attack against this translation system is persistent XSS which would allow an attacker to deface every website using this dataset. Just saying "oah we htmlencode the values" isn't enough. If you are supplying these data sets to a 3rd party you can't expect all of them to sanitize the data properly. To make matters worse, XSS is an output problem, you can't HTML encode the entire data set and expect it to be 100% safe because you have no idea how the data is going to be used within the HTML document. The problem is the data may end up within a script tag or event, and then the protection from html-encoding could be nullified entirely. I always chuckle when I see someone using strip_tags() to try and stop xss, this is just the wrong approach.
In summation there really isn't a 100% solution to the problem, but this will prevent most xss:
$var=htmlspecialchars($var,ENT_QUOTES,"UTF-8");
$var=rtrim($var,"\\");
Obviously the rtrim() is used to help prevent xss within a script tag. If the string ends with a backslash you can break out of a quoted string, backslashes are equally as dangerous as quote marks.
I think it's safe to say that HTML elements in the "new" string can only be those that were in the old string, minus a few specific attributes such as title and alt.
Example:
English string: <strong title="Just a test">Hover this message</strong>
Dutch translation: <strong title="Gewoon een test">Hang hier met de muis boven</strong> - will be marked as safe
Dutch translation: <strong onmouseover="window.location='something';">Hang hier met de muis boven</strong> will be invalidated by the filter
You would have to write a rather strong filter though, and always verify that no attributes were added, removed, and no HTML elements were added or removed. Also, always be careful with " and '.
I am working on a site that have an international aim; I.o.w., logged in users can add text in their own language. I am hoping for international page names and content.
An URL example, like the Japanese Wikipedia: http://ja.wikipedia.org/wiki/メインページ (Both pagename and content text).
I know by using UTF-8, I can do this, but how should I control it?
UTF-8 contains way to many languages/letters to control in a script, I guess, so how safe/unsafe is it to allow people to add UTF-8 text?
I can see that someone could add harmful code this way, but how to prevent it?
All information regarding safety/control when using UTF-8 is appreciated!
EDIT: PS! I use PHP and MySQL.
Warning: perhaps a slightly rusty response:
Note: not discussing host name (IDNS) issues.
The only completely safe thing here is to use %-escaped UTF-8. Some browsers will display this as what you want, and some will display the %-escapes. (e.g. http://foo.bar/%ee%cc%cf.html)
If you put 'real UTF-8' in the URLs, many things will work, but there may be unpleasant surprises lurking for some people in some browsers. I'm reading your question as dealing with 100% static content. If you are trying to do this with code behind the site, you have additional issues to work on.
The 'unpleasant surprises' would be (a) people finding the %xx's in the URL unreadable, (b) a browser that melts, (c) some data scraping or aggregating application that melts.
I wish I were more up to date on this, but I'm not, so my recommendation is to deploy a test site and then try to access it with everything you can put your hands on, including mobile phones. Persuade Google to index it, and see what happens there.
For domain names, this is called IDN. For page names, you may want to think of the possibility of IDN spoofs.
It's safe as long as you don't interpret it literally as SQL (SQL injection) or HTML (XSS) or any other language. Just escape any user-controlled input (request URL, request headers, request parameters, request body, etc..etc..) at the point it's going to be used in SQL or HTML.
It's unclear what server side programming language you're using, so I can't go further in detail.
How to safe gaurd a form against script injection attacks. This is one of the most used form of attacks in which attacker attempts to inject a JS script through form field. The validation for this case must check for special characters in the form fields. Look for
suggestions, recommedations at internet/jquery etc for permissible characters &
character masking validation JS codes.
You can use the HTML Purifier (in case you are under PHP or you might have other options for the language you are under) to avoid XSS (cross-site-scripting) attacks to great level but remember no solution is perfect or 100% reliable. This should help you and always remember server-side validation is always best rather than relying on javascript which bad guys can bypass easily disabling javascript.
For SQL Injection, you need to escape invalid characters from queries that can be used to manipulate or inject your queries and use type-casting for all your values that you want to insert into the database.
See the Security Guide for more security risks and how to avoid them. Note that even if you are not using PHP, the basic ideas for the security are same and this should get you in a better position about security considerations.
If you output user controlled input in html context then you could follow what others and sanitize when processing input (html purify, custom input validation) and/or html encode the values before output.
Cases when htmlencodng/strip tags (no tags needed) is not sufficient:
user input appears in attributes then it depends on whether you always (double) quote attributes or not (bad)
used in on* handlers (such as onload="..), then html encoding is not sufficient since the javascript parser is called after html decode.
appears in javascript section - depends on whether this is in quoted (htmlentity encode not sufficient) or unquoted region (very bad).
is returned as json which may be eval'ed. javascript escape required.
appears in CSS - css escape is different and css allows javascript (expression)
Also, these do not account for browser flaws such as incomplete UTF-8 sequence exploit, content-type sniffing exploits (UTF-7 flaw), etc.
Of course you also have to treat data to protect against other attacks (SQL or command injection).
Probably the best reference for this is at the OWASP XSS Prevention Cheat Sheet
ASP.NET has a feature called Request Validation that will prevent unencoded HTML from being processed by the server. For extra protection, one can use the AntiXSS library.
you can prevent script injection by encoding html content like
Server.HtmlEncode(input)
There is the OWASP EASPI too.
Is it possible to eliminate these characters from a wordpress password? I have heard that it can open up scripts this way, that hackers can use to get in. Thank you.
Simple answer:
Your friend has misinformed you. Restricting these characters in a wordpress password is not something you need to worry about. But as they say "There is no smoke without fire".
More background information:
In your own web-application code, you should always be especially careful whenever you take any data from a user (Whether from a form, a cookie,or a URL) or another external computer system or application. The reason for this is that you want to avoid the values being interpreted as code and not just used as data.
The issue that has led your friend to worry about the <> characters is called Cross-Site Scripting and is a kind of attack that malicious users can perform to "inject" html or javascript content into your pages. If you accept information from the user that contains these html mark-up characters and re-display it on the same, or another page, then you can cause their html or javascript content to become part of your page. Any javascript content will run with access to the same data as the user that views the page.
Whenever outside data is read, it sould always be
validated : i.e. checked that it looks like the kind of thing you are expecting, and rejected if it doe not.
and encoded: i.e. When this data is displayed to back to the user or sent to another part of the system, it is converted to be safe. The type of conversion always depends on how and where the data is being used.
Please note that the angle-bracket characters are not the only thing to worry about. Please also note that it is well proven that disallowing certain characters (also called "blacklisting") is never the best way to secure code. It is always safer to state what is allowed (also called "whitelisting").
The Zend Framework Manual says the following:
60.3.1. Escaping Output
One of the most important tasks to
perform in a view script is to make
sure that output is escaped properly;
among other things, this helps to
avoid cross-site scripting attacks.
Unless you are using a function,
method, or helper that does escaping
on its own, you should always escape
variables when you output them.
Why 'always'? Why do I have to escape variables that have not been created or altered by user input?
Users aren't the only source of dodgy strings in output. Consider, for example, the apparently safe string "Romeo & Juliet" coming out of a database. No cross-site scripting there, you say? True enough. Stick it in a web page, however, and the raw ampersand could cause some interesting problems with validation, parsing, etc.
Output escaping isn't just to guard against malicious or accidentally borked input, it ensures that the output is thoroughly sanitised and treated as having no special meaning in the surrounding output format, whether that's HTML, XML, JSON or whatever.
As a rule, I would escape anything coming from user input, a data source or even calculations. You want the output to be predictable, escaping ensures that it is. If the value when converted to a string contains characters that break your desired markup, things would get messy.
If you're using a view, $this->escape($variableToEscape) should suffice.
Another thing is many times things that are hard coded one day become user or at least database generated another day. Its just better practice to manage output of variables in your code.
You could look at it this way: you should always HTML-encode variables, unless you know that they've already been encoded.
Say you have a variable that contains:
foo <b>bar</b>
If you know that it contains HTML tags, and you're okay with that, then you can say that this variable has already been properly HTML-encoded. You could even assign it to a different variable type to make the compiler aware of the distinction (Joel's idea), and have your output functions handle these types without escaping them.
Of course, this means that
foo & <b>bar</b>
is an incorrect value; you would need to ensure that it's:
foo & <b>bar</b>
I think the best practice here is always escape output unless you intend to output a raw HTML fragment. Even "safe" data can contain characters which need to be escaped. For example, consider the e-mail address '"Bob" <bob#bob.com>'. If you don't escape it, the browser will think <bob#bob.com> is a tag.
Obviously you want to escape things that are the result of user data to prevent XSS attacks. Since you're often changing what you're republishing and what you're not, you probably can't remember all the places that need to be changed... So even if you get all the nuances correct now, and your site is secure from XSS scripting today, you may at some point may add user input to some variable you're not escaping (or more likely, some variable to some variable to some variable which you're not escaping), which would open you up to XSS attacks.
Escaping by default would prevent that attack.
The other reason is more conceptual: with MVC, all of your markup--which is, by definition, the "view"--should be in your view templates. So if your controller is determining the view, and the view contains all the markup, why not escape your variables?
Well, if you have hard-coded values (let's say language translations, which you read from a database or XML file), you don't have to escape them.
But if there is a value that has been created/modified by user, even let's say in admin panel, you have to escape it, because you don't know what kind of data user or if I'm more radical, even administrator, will send.