security for http headers - security

We'd like to double-check our http headers for security before we send them out. Obviously we can't allow '\r' or '\n' to appear, as that would allow content injection.
I see just two options here:
Truncate the value at the newline character.
Strip the invalid character from the header value.
Also, from reading RFC2616, it seems that only ascii-printable characters are valid for http header values Should also I follow the same policy for the other 154 possible invalid bytes?
Or, is there any authoritative prior art on this subject?

This attack is called "header splitting" or "response splitting".
That OWASP link points out that removing CRLF is not sufficient. \n can be just as dangerous.
To mount a successful exploit, the application must allow input that contains CR (carriage return, also given by 0x0D or \r) and LF (line feed, also given by 0x0A or \n)characters into the header.
(I do not know why OWASP (and other pages) list \n as a vulnerability or whether that only applies to query fragments pre-decode.)
Serving a 500 on any attempt to set a header that contains a character not allowed by the spec in a header key or value is perfectly reasonable, and will allow you to identify offensive requests in your logs. Failing fast when you know your filters are failing is a fine policy.
If the language you're working in allows it, you could wrap your HTTP response object in one that raises an exception when a bad header is seen, or you could change the response object to enter an invalid state, set the response code to 500, and close the response body stream.
EDIT:
Should I strip non-ASCII inputs?
I prefer to do that kind of normalization in the layer that receives trusted input unless, as in the case of entity-escaping to convert plain-text to HTML escaping, there is a clear type conversion. If it's a type conversion, I do it when the output type is required, but if it is not a type-conversion, I do it as early as possible so that all consumers of data of that type see a consistent value. I find this approach makes debugging and documentation easier since layers below input handling never have to worry about unnormalized inputs.
When implementing the HTTP response wrapper, I would make it fail on all non-ascii characters (including non-ASCII newlines like U+85, U+2028, U+2029) and then make sure my application tests include a test for each third-party URL input to makes sure that any Location headers are properly %-encoded before the Location reaches setHeader, and similarly for other inputs that might reach the request headers.
If your cookies include things like a user-id or email address, I would make sure the dummy accounts for tests include a dummy account with a user-id or email address containing a non-ASCII letter.

The simple removal of new lines \n will prevent HTTP Response Splitting. Even though a CRLF is used as a delimiter in the RFC, the new line alone is recognized by all browsers.
You still have to worry about user content within a set-cookie or content-type. Attributes within these elements are delimited using a ;, it maybe possible for an attacker to change the content type to UTF-7 and bypass your XSS protection for IE users (and only IE users). It may also be possible for an attacker to create a new cookie, which introduces the possibility of Session Fixation.

Non-ASCII characters are allowed in header fields, although the spec doesn't really clearly say what they mean; so it's up to sender and recipient to agree on their semantics.
What made you think otherwise?

Related

Azure Active Directory OAuth 2.0 Authorization gives Bad Request

When requesting an authorization code, if the state url parameter has following value, https://login.microsoftonline.com/oauth2/authorize gives me a Bad Request.
state=%3C%3CMULE_EVENT_ID%3D0-6cadfe22-e9ea-11e6-99ff-205120524153%3E%3E
If I remove the encoded values: << and >>, it works well. Currently I have some limitations and I cannot remove those values.
In the documentation is says that "state" is a value included in the request that will also be returned in the token response. It can be a string of any content that you wish.
The double << >> appears to be semantically incorrect, although those characters are allowed in https://www.rfc-editor.org/rfc/rfc6749#appendix-A.5 (referencing ABNF syntax for that field, which is essentially all printable characters including space, VSCHAR, https://www.rfc-editor.org/rfc/rfc5234).
However, when we look at the intended use of the state field, it is to be used to send a token back from the service, for your application to be able to validate the local state to avoid CSRF attacks.
In most cases, a short string should suffice, and you will probably do yourself a favor if you keep the string short, saving bytes on the wire and additional parsing overhead.
There is a good overview of using the oauth2 endpoint with here (admittedly with Bing Ads, but the principals and advice are applicable to this case):
https://msdn.microsoft.com/en-us/library/bing-ads-user-authentication-oauth-guide.aspx
If I can find the exact restrictions on the state field, I shall update my answer.
Well, the documentation seems a bit wrong then. I tested various state strings, and what makes it fail consistently is starting the state string with %3C. So a less-than sign is fine in some places in the string.
EDIT: There is something really odd going on.
This fails:
state=MUL%3CE_EVENT_ID%3D0-6cadfe22-e9ea-11e6-99ff-205120524153%3E%3E
But this works:
state=MULE%3C_EVENT_ID%3D0-6cadfe22-e9ea-11e6-99ff-205120524153%3E%3E
But this also fails:
state=MULE_%3CEVENT_ID%3D0-6cadfe22-e9ea-11e6-99ff-205120524153%3E%3E
My theory is that it doesn't allow anything that looks like a valid HTML tag. That's why it would allow %3C_....%3D, but *%3Ca%3e is not. You can replace a with any characters a-z. So HTML elements are a no-no :)

the remote server returned an error 414 request uri too long

I am using SaveBinaryDirect method to upload file to SharePoint library. I am getting error like below
the remote server returned an error 414 request uri too long
Can anybody help me please
I wouldn't call this a SharePoint problem necessarily, more like a problem that happens a lot in SharePoint... Essentially, you have around a 2,000 character limit for the URL. In most scenarios this is fine, however in SharePoint it occasionally becomes an issue.
Users tend to create a lot of nested libraries and the name of each library becomes part of the URL - separated by '/'. Then the file name is added at the end of the URL. And to make matters worse, if there are any spaces or un-URL friendly characters, they are encoded and become three characters each - space becomes %20. This all adds up.
In my experience the solution is a combination of user education and proper architecture. Instead of creating nested libraries, store the documents in a single library and differentiate the items by assigning meta-data attributes, then create views to display items of a particular type.
This error can also be cause by having "invalid" characters in the filename or path. See this answer for what makes a character invalid in a URI:
Which characters make a URL invalid?

Custom HTTP header value - trying to pass umlaut characters

I am using Node.js and Express.js 3.x.
As one of our authorization headers we are passing in the username. Some of our usernames contain umlaut characters: ü ö ä and the likes of.
For usernames with just 'normal' characters, all works fine. But when a jörg tries to make a request, the server doesn't recognize the umlaut character in the header.
Trying to simulate the problem I:
Created some tests that set the username header with the umlaut character. These tests pass, they are able to pass in the umlaut correctly.
Used 'postman' and 'advanced rest client' Chrome extensions and made the request manually against the server - in this case it failed. I saw the server is unable to recognize the umlaut character, it juts interpreted it as some sort of ?.
Is there any limitation on custom HTTP header values characters that forbids using these kind of characters? Any idea why it would work in tests but not from my browser extension? Am I forgetting to set some character set somewhere?
Summary of what was written in the other related question and in comments:
You may put any 'printable' ASCII char in your custom header value field.
If you want to use special characters, encode these characters following whatever rules/charset/encoding you choose. As long as this encoding it uses simple ASCII chars, it's OK. An example is using UTF-8 and encoding string chars to \u%%.
On the server side - you must manually make sense out of the encoded string, probably by decoding it using the rules of the character set/encoding paradigm you chose before.

How to ensure website security checks

How to safe gaurd a form against script injection attacks. This is one of the most used form of attacks in which attacker attempts to inject a JS script through form field. The validation for this case must check for special characters in the form fields. Look for
suggestions, recommedations at internet/jquery etc for permissible characters &
character masking validation JS codes.
You can use the HTML Purifier (in case you are under PHP or you might have other options for the language you are under) to avoid XSS (cross-site-scripting) attacks to great level but remember no solution is perfect or 100% reliable. This should help you and always remember server-side validation is always best rather than relying on javascript which bad guys can bypass easily disabling javascript.
For SQL Injection, you need to escape invalid characters from queries that can be used to manipulate or inject your queries and use type-casting for all your values that you want to insert into the database.
See the Security Guide for more security risks and how to avoid them. Note that even if you are not using PHP, the basic ideas for the security are same and this should get you in a better position about security considerations.
If you output user controlled input in html context then you could follow what others and sanitize when processing input (html purify, custom input validation) and/or html encode the values before output.
Cases when htmlencodng/strip tags (no tags needed) is not sufficient:
user input appears in attributes then it depends on whether you always (double) quote attributes or not (bad)
used in on* handlers (such as onload="..), then html encoding is not sufficient since the javascript parser is called after html decode.
appears in javascript section - depends on whether this is in quoted (htmlentity encode not sufficient) or unquoted region (very bad).
is returned as json which may be eval'ed. javascript escape required.
appears in CSS - css escape is different and css allows javascript (expression)
Also, these do not account for browser flaws such as incomplete UTF-8 sequence exploit, content-type sniffing exploits (UTF-7 flaw), etc.
Of course you also have to treat data to protect against other attacks (SQL or command injection).
Probably the best reference for this is at the OWASP XSS Prevention Cheat Sheet
ASP.NET has a feature called Request Validation that will prevent unencoded HTML from being processed by the server. For extra protection, one can use the AntiXSS library.
you can prevent script injection by encoding html content like
Server.HtmlEncode(input)
There is the OWASP EASPI too.

Will HTML Encoding prevent all kinds of XSS attacks?

I am not concerned about other kinds of attacks. Just want to know whether HTML Encode can prevent all kinds of XSS attacks.
Is there some way to do an XSS attack even if HTML Encode is used?
No.
Putting aside the subject of allowing some tags (not really the point of the question), HtmlEncode simply does NOT cover all XSS attacks.
For instance, consider server-generated client-side javascript - the server dynamically outputs htmlencoded values directly into the client-side javascript, htmlencode will not stop injected script from executing.
Next, consider the following pseudocode:
<input value=<%= HtmlEncode(somevar) %> id=textbox>
Now, in case its not immediately obvious, if somevar (sent by the user, of course) is set for example to
a onclick=alert(document.cookie)
the resulting output is
<input value=a onclick=alert(document.cookie) id=textbox>
which would clearly work. Obviously, this can be (almost) any other script... and HtmlEncode would not help much.
There are a few additional vectors to be considered... including the third flavor of XSS, called DOM-based XSS (wherein the malicious script is generated dynamically on the client, e.g. based on # values).
Also don't forget about UTF-7 type attacks - where the attack looks like
+ADw-script+AD4-alert(document.cookie)+ADw-/script+AD4-
Nothing much to encode there...
The solution, of course (in addition to proper and restrictive white-list input validation), is to perform context-sensitive encoding: HtmlEncoding is great IF you're output context IS HTML, or maybe you need JavaScriptEncoding, or VBScriptEncoding, or AttributeValueEncoding, or... etc.
If you're using MS ASP.NET, you can use their Anti-XSS Library, which provides all of the necessary context-encoding methods.
Note that all encoding should not be restricted to user input, but also stored values from the database, text files, etc.
Oh, and don't forget to explicitly set the charset, both in the HTTP header AND the META tag, otherwise you'll still have UTF-7 vulnerabilities...
Some more information, and a pretty definitive list (constantly updated), check out RSnake's Cheat Sheet: http://ha.ckers.org/xss.html
If you systematically encode all user input before displaying then yes, you are safe you are still not 100 % safe.
(See #Avid's post for more details)
In addition problems arise when you need to let some tags go unencoded so that you allow users to post images or bold text or any feature that requires user's input be processed as (or converted to) un-encoded markup.
You will have to set up a decision making system to decide which tags are allowed and which are not, and it is always possible that someone will figure out a way to let a non allowed tag to pass through.
It helps if you follow Joel's advice of Making Wrong Code Look Wrong or if your language helps you by warning/not compiling when you are outputting unprocessed user data (static-typing).
If you encode everything it will. (depending on your platform and the implementation of htmlencode) But any usefull web application is so complex that it's easy to forget to check every part of it. Or maybe a 3rd party component isn't safe. Or maybe some code path that you though did encoding didn't do it so you forgot it somewhere else.
So you might want to check things on the input side too. And you might want to check stuff you read from the database.
As mentioned by everyone else, you're safe as long as you encode all user input before displaying it. This includes all request parameters and data retrieved from the database that can be changed by user input.
As mentioned by Pat you'll sometimes want to display some tags, just not all tags. One common way to do this is to use a markup language like Textile, Markdown, or BBCode. However, even markup languages can be vulnerable to XSS, just be aware.
# Markup example
[foo](javascript:alert\('bar'\);)
If you do decide to let "safe" tags through I would recommend finding some existing library to parse & sanitize your code before output. There are a lot of XSS vectors out there that you would have to detect before your sanitizer is fairly safe.
I second metavida's advice to find a third-party library to handle output filtering. Neutralizing HTML characters is a good approach to stopping XSS attacks. However, the code you use to transform metacharacters can be vulnerable to evasion attacks; for instance, if it doesn't properly handle Unicode and internationalization.
A classic simple mistake homebrew output filters make is to catch only < and >, but miss things like ", which can break user-controlled output out into the attribute space of an HTML tag, where Javascript can be attached to the DOM.
No, just encoding common HTML tokens DOES NOT completely protect your site from XSS attacks. See, for example, this XSS vulnerability found in google.com:
http://www.securiteam.com/securitynews/6Z00L0AEUE.html
The important thing about this type of vulnerability is that the attacker is able to encode his XSS payload using UTF-7, and if you haven't specified a different character encoding on your page, a user's browser could interpret the UTF-7 payload and execute the attack script.
One other thing you need to check is where your input comes from. You can use the referrer string (most of the time) to check that it's from your own page, but putting in a hidden random number or something in your form and then checking it (with a session set variable maybe) also helps knowing that the input is coming from your own site and not some phishing site.
I'd like to suggest HTML Purifier (http://htmlpurifier.org/) It doesn't just filter the html, it basically tokenizes and re-compiles it. It is truly industrial-strength.
It has the additional benefit of allowing you to ensure valid html/xhtml output.
Also n'thing textile, its a great tool and I use it all the time, but I'd run it though html purifier too.
I don't think you understood what I meant re tokens. HTML Purifier doesn't just 'filter', it actually reconstructs the html. http://htmlpurifier.org/comparison.html
I don't believe so. Html Encode converts all functional characters (characters which could be interpreted by the browser as code) in to entity references which cannot be parsed by the browser and thus, cannot be executed.
<script/>
There is no way that the above can be executed by the browser.
**Unless their is a bug in the browser ofcourse.*
myString.replace(/<[^>]*>?/gm, '');
I use it, then successfully.
Strip HTML from Text JavaScript

Resources