Why do we have two delimiters for query parameters? - web

We all know there are 2 delimiters for query strings. Which are ? and &. Why wouldn't we use just ? for both cases? Why do we need &
RFC 3986 gives description of the standard, but does't provide us with motivation on that subject.
The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.

If you read the various URL specifications, you will see that it doesn't set out any syntax for the <query> component. Indeed, the client and server could agree on any syntax for the string subject to the restrictions on allowed / reserved characters.
The ?<name>=<value>&<name>=<value> syntax that is most commonly used comes from the HTML specification. Look for the section of the HTML spec (pick any version) that specifies the "application/x-www-form-urlencoded" encoding scheme for form parameters.
Why does the HTML spec not use & as the parameter separator? I think that is because the URL spec says that ? is reserved in the <query> part. (So if HTML used ? as a separator, it would need to be percent-encoded.)
Why is ? reserved in the <query> part? Well now we are getting into the history of http: hyperlinks before a unified URL specification existed. Basically, I don't know, but it could have been related to the way that early web servers or browsers parsed hyperlinks.

Related

Azure Active Directory OAuth 2.0 Authorization gives Bad Request

When requesting an authorization code, if the state url parameter has following value, https://login.microsoftonline.com/oauth2/authorize gives me a Bad Request.
state=%3C%3CMULE_EVENT_ID%3D0-6cadfe22-e9ea-11e6-99ff-205120524153%3E%3E
If I remove the encoded values: << and >>, it works well. Currently I have some limitations and I cannot remove those values.
In the documentation is says that "state" is a value included in the request that will also be returned in the token response. It can be a string of any content that you wish.
The double << >> appears to be semantically incorrect, although those characters are allowed in https://www.rfc-editor.org/rfc/rfc6749#appendix-A.5 (referencing ABNF syntax for that field, which is essentially all printable characters including space, VSCHAR, https://www.rfc-editor.org/rfc/rfc5234).
However, when we look at the intended use of the state field, it is to be used to send a token back from the service, for your application to be able to validate the local state to avoid CSRF attacks.
In most cases, a short string should suffice, and you will probably do yourself a favor if you keep the string short, saving bytes on the wire and additional parsing overhead.
There is a good overview of using the oauth2 endpoint with here (admittedly with Bing Ads, but the principals and advice are applicable to this case):
https://msdn.microsoft.com/en-us/library/bing-ads-user-authentication-oauth-guide.aspx
If I can find the exact restrictions on the state field, I shall update my answer.
Well, the documentation seems a bit wrong then. I tested various state strings, and what makes it fail consistently is starting the state string with %3C. So a less-than sign is fine in some places in the string.
EDIT: There is something really odd going on.
This fails:
state=MUL%3CE_EVENT_ID%3D0-6cadfe22-e9ea-11e6-99ff-205120524153%3E%3E
But this works:
state=MULE%3C_EVENT_ID%3D0-6cadfe22-e9ea-11e6-99ff-205120524153%3E%3E
But this also fails:
state=MULE_%3CEVENT_ID%3D0-6cadfe22-e9ea-11e6-99ff-205120524153%3E%3E
My theory is that it doesn't allow anything that looks like a valid HTML tag. That's why it would allow %3C_....%3D, but *%3Ca%3e is not. You can replace a with any characters a-z. So HTML elements are a no-no :)

What kind of encoder encodes string like this?

I have a question about encoding/decoding strings.
Well, there is web page, where I send some data with simple php POST form.
When I open Chrome Developer Toolbar -> Network, in "Form Data" all parameters are displayed normally, except this, "uid", which is encoded ( %25%DC%BE%60%A0W%94M ) somehow.
When I clicked on "view URL encoded", it showed me this "%2525%25DC%25BE%2560%25A0W%2594M", I tried online tools such http://meyerweb.com/eric/tools/dencoder/ to get human readable string of this encoded parameter, but no luck.
Can anyone explain to me, how can I get the original value of this parameter? Not encoded, in human readable format?
Thanks a lot : )
This decoder works better:
http://www.opinionatedgeek.com/dotnet/tools/urlencode/Decode.aspx/
The %25 that you see is the actual percent character % being encoded
http://en.wikipedia.org/wiki/Percent-encoding
Percent-encoding, also known as URL encoding, is a mechanism for
encoding information in a Uniform Resource Identifier (URI) under
certain circumstances.
...
it is also used in the preparation
of data of the "application/x-www-form-urlencoded" media type, as is
often used in the submission of HTML form data in HTTP requests.
If you're having problems with online decoders, and (seeing as its a relatively short string) why not give it a go by hand?
http://www.degraeve.com/reference/urlencoding.php
This table maps characters to their URL-encoded equivalent, just do a Ctrl+F of the % encoded characters and decode it yourself.
A few of the characters look wierd because they aren't English characters. %DC is Ü for example. its possible the encoders you are trying don't recognise non-english characters

security for http headers

We'd like to double-check our http headers for security before we send them out. Obviously we can't allow '\r' or '\n' to appear, as that would allow content injection.
I see just two options here:
Truncate the value at the newline character.
Strip the invalid character from the header value.
Also, from reading RFC2616, it seems that only ascii-printable characters are valid for http header values Should also I follow the same policy for the other 154 possible invalid bytes?
Or, is there any authoritative prior art on this subject?
This attack is called "header splitting" or "response splitting".
That OWASP link points out that removing CRLF is not sufficient. \n can be just as dangerous.
To mount a successful exploit, the application must allow input that contains CR (carriage return, also given by 0x0D or \r) and LF (line feed, also given by 0x0A or \n)characters into the header.
(I do not know why OWASP (and other pages) list \n as a vulnerability or whether that only applies to query fragments pre-decode.)
Serving a 500 on any attempt to set a header that contains a character not allowed by the spec in a header key or value is perfectly reasonable, and will allow you to identify offensive requests in your logs. Failing fast when you know your filters are failing is a fine policy.
If the language you're working in allows it, you could wrap your HTTP response object in one that raises an exception when a bad header is seen, or you could change the response object to enter an invalid state, set the response code to 500, and close the response body stream.
EDIT:
Should I strip non-ASCII inputs?
I prefer to do that kind of normalization in the layer that receives trusted input unless, as in the case of entity-escaping to convert plain-text to HTML escaping, there is a clear type conversion. If it's a type conversion, I do it when the output type is required, but if it is not a type-conversion, I do it as early as possible so that all consumers of data of that type see a consistent value. I find this approach makes debugging and documentation easier since layers below input handling never have to worry about unnormalized inputs.
When implementing the HTTP response wrapper, I would make it fail on all non-ascii characters (including non-ASCII newlines like U+85, U+2028, U+2029) and then make sure my application tests include a test for each third-party URL input to makes sure that any Location headers are properly %-encoded before the Location reaches setHeader, and similarly for other inputs that might reach the request headers.
If your cookies include things like a user-id or email address, I would make sure the dummy accounts for tests include a dummy account with a user-id or email address containing a non-ASCII letter.
The simple removal of new lines \n will prevent HTTP Response Splitting. Even though a CRLF is used as a delimiter in the RFC, the new line alone is recognized by all browsers.
You still have to worry about user content within a set-cookie or content-type. Attributes within these elements are delimited using a ;, it maybe possible for an attacker to change the content type to UTF-7 and bypass your XSS protection for IE users (and only IE users). It may also be possible for an attacker to create a new cookie, which introduces the possibility of Session Fixation.
Non-ASCII characters are allowed in header fields, although the spec doesn't really clearly say what they mean; so it's up to sender and recipient to agree on their semantics.
What made you think otherwise?

Why the restricting URI characters in the config file?

I am using clean URL for search. If the user types a single quote it says disallowed URI character. And I know how to enable a character for appearing in the URL. I want to know the security vulnerabilities on allowing certain characters like braces, quotes and others?
I want to know this by any means like explanation or external references.
I am assuming you are talking about "query string" part of the URL, if that is so then your framework is probably disallowing those characters to prevent SQL inject sort of attacks as in your code you may end up using those query string values to construct a SQL query and boom, your application is SQL injected.

How to ensure website security checks

How to safe gaurd a form against script injection attacks. This is one of the most used form of attacks in which attacker attempts to inject a JS script through form field. The validation for this case must check for special characters in the form fields. Look for
suggestions, recommedations at internet/jquery etc for permissible characters &
character masking validation JS codes.
You can use the HTML Purifier (in case you are under PHP or you might have other options for the language you are under) to avoid XSS (cross-site-scripting) attacks to great level but remember no solution is perfect or 100% reliable. This should help you and always remember server-side validation is always best rather than relying on javascript which bad guys can bypass easily disabling javascript.
For SQL Injection, you need to escape invalid characters from queries that can be used to manipulate or inject your queries and use type-casting for all your values that you want to insert into the database.
See the Security Guide for more security risks and how to avoid them. Note that even if you are not using PHP, the basic ideas for the security are same and this should get you in a better position about security considerations.
If you output user controlled input in html context then you could follow what others and sanitize when processing input (html purify, custom input validation) and/or html encode the values before output.
Cases when htmlencodng/strip tags (no tags needed) is not sufficient:
user input appears in attributes then it depends on whether you always (double) quote attributes or not (bad)
used in on* handlers (such as onload="..), then html encoding is not sufficient since the javascript parser is called after html decode.
appears in javascript section - depends on whether this is in quoted (htmlentity encode not sufficient) or unquoted region (very bad).
is returned as json which may be eval'ed. javascript escape required.
appears in CSS - css escape is different and css allows javascript (expression)
Also, these do not account for browser flaws such as incomplete UTF-8 sequence exploit, content-type sniffing exploits (UTF-7 flaw), etc.
Of course you also have to treat data to protect against other attacks (SQL or command injection).
Probably the best reference for this is at the OWASP XSS Prevention Cheat Sheet
ASP.NET has a feature called Request Validation that will prevent unencoded HTML from being processed by the server. For extra protection, one can use the AntiXSS library.
you can prevent script injection by encoding html content like
Server.HtmlEncode(input)
There is the OWASP EASPI too.

Resources