What kind of encoder encodes string like this? - string

I have a question about encoding/decoding strings.
Well, there is web page, where I send some data with simple php POST form.
When I open Chrome Developer Toolbar -> Network, in "Form Data" all parameters are displayed normally, except this, "uid", which is encoded ( %25%DC%BE%60%A0W%94M ) somehow.
When I clicked on "view URL encoded", it showed me this "%2525%25DC%25BE%2560%25A0W%2594M", I tried online tools such http://meyerweb.com/eric/tools/dencoder/ to get human readable string of this encoded parameter, but no luck.
Can anyone explain to me, how can I get the original value of this parameter? Not encoded, in human readable format?
Thanks a lot : )

This decoder works better:
http://www.opinionatedgeek.com/dotnet/tools/urlencode/Decode.aspx/
The %25 that you see is the actual percent character % being encoded
http://en.wikipedia.org/wiki/Percent-encoding
Percent-encoding, also known as URL encoding, is a mechanism for
encoding information in a Uniform Resource Identifier (URI) under
certain circumstances.
...
it is also used in the preparation
of data of the "application/x-www-form-urlencoded" media type, as is
often used in the submission of HTML form data in HTTP requests.

If you're having problems with online decoders, and (seeing as its a relatively short string) why not give it a go by hand?
http://www.degraeve.com/reference/urlencoding.php
This table maps characters to their URL-encoded equivalent, just do a Ctrl+F of the % encoded characters and decode it yourself.
A few of the characters look wierd because they aren't English characters. %DC is Ü for example. its possible the encoders you are trying don't recognise non-english characters

Related

UTF-8 string not displaying character properly

I am returning text field from DB column:
"algún nombre de juego"
when I try to use this text in an html file, I get the following string:
algún nombre de juego
This happens consistently with any string that is not pure english. I have even tried grabbing simple text from online in different languages and as soon as I plug them into an html text box, the print out with improper encoding like the example above.
From reading Unicode documentation for python, UTF-8 should be able to handle pretty much any character in any language.
I have tried many ways of encoding/decoding, and either there is an encoding error, or I get back weird character where the letter u with acute should be.
Per comments, I am not using Django or Flask. Just taking strings from a DB that might be in several different languages and generating an HTML file for internal use.
A few ideas:
1. Remember to declare the encoding. Either:
Insert a <meta charset="utf-8"> just after your <head> tag.
OR
do this in the HTTP header (making it look something like Content-Type:text/html; charset=UTF-8). This can be done by changing the webserver's settings or using server-side code, should you choose to host the page.
2. Ensure that your IDE saves files as UTF-8. This will be somewhere in your settings and should be done automatically, but worth checking if option 1 doesn't work.
Hope one of these works.

How do you extract data associated with an image that is an attachment to a Notes RichTextItem?

I've seen things that skirt around this question, but nothing that answers it directly.
I have a RichTextItem in a document that contains an attached image. If I look at the document properties for the field, it says:
Data Type: MIME Part
Data Length: 7615 bytes
...
"Content-Transfer-Encoding: binary
Content-Type: image/jpeg
then a bit of binary data. How can I extract that data in server-side javascript so that I can use it in the value of an image control? In other words, I want the data corresponding to that image to appear in the following so that it renders in the web browser:
<xp:image><xp:this.value><![CDATA[#{javascript:"data:image/jpeg;base64,<DATA HERE>
Can this be done? I've tried all sorts of things but to no avail.
Thanks,
Reid
There a several approaches you can play with.
The "cheat" way: use ....nsf/0/unid/RTItemName?OpenField in a dojo panel as its source. (see here. It would open the whole RichText.
Eventually you need OpenElement instead - can directly address an attachment
Last not least, since your field isn't actually RichText, but MIME, you can use the Notes MIME classes to get to the content and render it base64. The mime classes allow to get the data as stream as well as provide methods to then encode it, so you don't need an extra encoder class
Hope that helps

Securing application against XSS

We are currently using OWASP Antisamy project to protect our application against XSS attacks. Every input field is sanitized when any given form is submitted to server. It works fine, but we have issues with the fields like company name, organization name, etc.
Ex: Ampersand is escaped for AT&T and the company name is displayed wrong (displayed with escaped characters).
We manually update the fields on database to fix this issue. However, this is a pain in the neck as you can imagine.
Is there a way to address this using OWASP antisamy or should we use a different library?
You should only encode on output, not on input. If a user enters AT&T in your application, this should be stored at AT&T in the database. There is no need to encode it, but of course make sure that you are using parameterised queries which will prevent characters such as ' from breaking out of the SQL command context and causing SQL injection.
When you output, this is the only time you need to encode characters. e.g. AT&T should be encoded as AT&T when output to HTML, so it is displayed in the browser as AT&T.
It seems like your application is encoding the input and also encoding the output, so strings like above will be double encoded at then output as AT&amp;T in your HTML, causing the problem. Remove your input encoding to solve this.
The reason you should only encode when output is that if you decide you want to output data to a different format such as JSON or JavaScript, then the encoding is different. O'leary would become O\x27leary if encoded properly for JavaScript, which would not display properly in HTML where the encoding is O'leary.

Do you HtmlEncode during input or output?

When do you call Microsoft.Security.Application.AntiXss.HtmlEncode? Do you do it when the user submits the information or do you do when you're displaying the information?
How about for basic stuff like First Name, Last Name, City, State, Zip?
You do it when you are displaying the information. Preserve the original as it was entered, convert it for display on a web page. Let's say you were displaying it in some other way, like exporting it into Excel. In that case, you'd want to export the preserved original.
Encode every single string.
You should only encode or escape your data at the last possible moment, whether that's directly before you put it in the database, or display it on the screen. If you encode too soon, you run the risk of accidentally double encoding (you'll often see &amp; on newbies' websites - myself included).
If you do want to encode sooner than that, then take measures to avoid the double encoding. Joel wrote an article about good uses for hungarian notation, where he advocated use of prefixes to determine what is stored in the variable. eg: "us" for unsafe string, "ss" for safe string.
usFirstName = getUserInput('firstName')
ssFirstName = cleanString(usFirstName);
Also note that it doesn't matter what the type of information is (city, zip code, etc) - leaving any of these unchecked is asking for trouble.
It depends on your situation. Where I work, for years the company did no HTML encoding, so when we started doing it, it would have been almost impossible to find every location within the system that user input could be displayed on the page.
Instead we chose to sanitize input on its way into the system since there were fewer input points than output points. We sanitize immediately before inputting data into the DB, although we don't use Microsoft's AntiXss library, we use a set of homebrew methods that whitelist ranges of HTML tags and characters depending on the type of input.
If you're designing the system from scratch, or you have a system that is small (or managed well) enough to encode output, follow Corey's suggestion. It's definitely the better way to do it.
Encoding is not a property of the data, it is a property of the transport mechanism. Therefore you should unencode data when you receive it, and encode it appropriately before transmission. The transport mechanism determines what sort of encoding is necessary.
This principle holds true whether your transport mechanism is HTML, HTTP, smoke signals, etc. The trick is knowing how to do the types of encoding manually, and when various frameworks do the steps for you automagically. For instance, ASP.NET will encode data assigned to a System.Web.UI.WebControls.Button's Text, but not text assigned to a System.Web.UI.WebControls.Literal's Text. jQuery will encode content you set with .innerText(), but not content you set with .innerHtml().

Will HTML Encoding prevent all kinds of XSS attacks?

I am not concerned about other kinds of attacks. Just want to know whether HTML Encode can prevent all kinds of XSS attacks.
Is there some way to do an XSS attack even if HTML Encode is used?
No.
Putting aside the subject of allowing some tags (not really the point of the question), HtmlEncode simply does NOT cover all XSS attacks.
For instance, consider server-generated client-side javascript - the server dynamically outputs htmlencoded values directly into the client-side javascript, htmlencode will not stop injected script from executing.
Next, consider the following pseudocode:
<input value=<%= HtmlEncode(somevar) %> id=textbox>
Now, in case its not immediately obvious, if somevar (sent by the user, of course) is set for example to
a onclick=alert(document.cookie)
the resulting output is
<input value=a onclick=alert(document.cookie) id=textbox>
which would clearly work. Obviously, this can be (almost) any other script... and HtmlEncode would not help much.
There are a few additional vectors to be considered... including the third flavor of XSS, called DOM-based XSS (wherein the malicious script is generated dynamically on the client, e.g. based on # values).
Also don't forget about UTF-7 type attacks - where the attack looks like
+ADw-script+AD4-alert(document.cookie)+ADw-/script+AD4-
Nothing much to encode there...
The solution, of course (in addition to proper and restrictive white-list input validation), is to perform context-sensitive encoding: HtmlEncoding is great IF you're output context IS HTML, or maybe you need JavaScriptEncoding, or VBScriptEncoding, or AttributeValueEncoding, or... etc.
If you're using MS ASP.NET, you can use their Anti-XSS Library, which provides all of the necessary context-encoding methods.
Note that all encoding should not be restricted to user input, but also stored values from the database, text files, etc.
Oh, and don't forget to explicitly set the charset, both in the HTTP header AND the META tag, otherwise you'll still have UTF-7 vulnerabilities...
Some more information, and a pretty definitive list (constantly updated), check out RSnake's Cheat Sheet: http://ha.ckers.org/xss.html
If you systematically encode all user input before displaying then yes, you are safe you are still not 100 % safe.
(See #Avid's post for more details)
In addition problems arise when you need to let some tags go unencoded so that you allow users to post images or bold text or any feature that requires user's input be processed as (or converted to) un-encoded markup.
You will have to set up a decision making system to decide which tags are allowed and which are not, and it is always possible that someone will figure out a way to let a non allowed tag to pass through.
It helps if you follow Joel's advice of Making Wrong Code Look Wrong or if your language helps you by warning/not compiling when you are outputting unprocessed user data (static-typing).
If you encode everything it will. (depending on your platform and the implementation of htmlencode) But any usefull web application is so complex that it's easy to forget to check every part of it. Or maybe a 3rd party component isn't safe. Or maybe some code path that you though did encoding didn't do it so you forgot it somewhere else.
So you might want to check things on the input side too. And you might want to check stuff you read from the database.
As mentioned by everyone else, you're safe as long as you encode all user input before displaying it. This includes all request parameters and data retrieved from the database that can be changed by user input.
As mentioned by Pat you'll sometimes want to display some tags, just not all tags. One common way to do this is to use a markup language like Textile, Markdown, or BBCode. However, even markup languages can be vulnerable to XSS, just be aware.
# Markup example
[foo](javascript:alert\('bar'\);)
If you do decide to let "safe" tags through I would recommend finding some existing library to parse & sanitize your code before output. There are a lot of XSS vectors out there that you would have to detect before your sanitizer is fairly safe.
I second metavida's advice to find a third-party library to handle output filtering. Neutralizing HTML characters is a good approach to stopping XSS attacks. However, the code you use to transform metacharacters can be vulnerable to evasion attacks; for instance, if it doesn't properly handle Unicode and internationalization.
A classic simple mistake homebrew output filters make is to catch only < and >, but miss things like ", which can break user-controlled output out into the attribute space of an HTML tag, where Javascript can be attached to the DOM.
No, just encoding common HTML tokens DOES NOT completely protect your site from XSS attacks. See, for example, this XSS vulnerability found in google.com:
http://www.securiteam.com/securitynews/6Z00L0AEUE.html
The important thing about this type of vulnerability is that the attacker is able to encode his XSS payload using UTF-7, and if you haven't specified a different character encoding on your page, a user's browser could interpret the UTF-7 payload and execute the attack script.
One other thing you need to check is where your input comes from. You can use the referrer string (most of the time) to check that it's from your own page, but putting in a hidden random number or something in your form and then checking it (with a session set variable maybe) also helps knowing that the input is coming from your own site and not some phishing site.
I'd like to suggest HTML Purifier (http://htmlpurifier.org/) It doesn't just filter the html, it basically tokenizes and re-compiles it. It is truly industrial-strength.
It has the additional benefit of allowing you to ensure valid html/xhtml output.
Also n'thing textile, its a great tool and I use it all the time, but I'd run it though html purifier too.
I don't think you understood what I meant re tokens. HTML Purifier doesn't just 'filter', it actually reconstructs the html. http://htmlpurifier.org/comparison.html
I don't believe so. Html Encode converts all functional characters (characters which could be interpreted by the browser as code) in to entity references which cannot be parsed by the browser and thus, cannot be executed.
<script/>
There is no way that the above can be executed by the browser.
**Unless their is a bug in the browser ofcourse.*
myString.replace(/<[^>]*>?/gm, '');
I use it, then successfully.
Strip HTML from Text JavaScript

Resources