Using Microsoft's AntiXssLibrary, how do you handle input that needs to be edited later?
For example:
User enters:
<i>title</i>
Saved to the database as:
<i>title</i>
On an edit page, in a text box it displays something like:
<i>title</i> because I've encoded it before displaying in the text box.
User doesn't like that.
Is it ok not to encode when writing to an input control?
Update:
I'm still trying to figure this out. The answers below seem to say to decode the string before displaying, but wouldn't that allow for XSS attacks?
The one user who said that decoding the string in an input field value is ok was downvoted.
Looks like you're encoding it more than once. In ASP.NET, using Microsoft's AntiXss Library you can use the HtmlAttributeEncode method to encode untrusted input:
<input type="text" value="<%= AntiXss.HtmlAttributeEncode("<i>title</i>") %>" />
This results in
<input type="text" value="<i>title</i>" /> in the rendered page's markup and is correctly displayed as <i>title</i> in the input box.
Your problem appears to be double-encoding; the HTML needs to be escaped once (so it can be inserted into the HTML on the page without issue), but twice leads to the encoded version appearing literally.
You can call HTTPUtility.HTMLDecode(MyString) to get the text back to the unencoded form.
If you are allowing users to enter HTML that will then be rendered on the site, you need to do more than just Encode and Decode it.
Using AntiXss prevents attacks by converting script and markup to text. It does not do anything to "clean" markup that will be rendered directly. You're going to have to manually remove script tags, etc. from the user's input to be fully protected in that scenario.
You'll need to strip out script tags as well as JavaScript attributes on legal elements. For example, an attacker could inject malicious code into the onclick or onmouseover attributes.
Yes, the code inside input boxes is safe from scripting attacks and does not need to be encoded.
Related
Is there any HTML sanitizer or cleanup methods available in any JSF utilities kit or libraries like PrimeFaces/OmniFaces?
I need to sanitize HTML input by user via p:editor and display safe HTML output using escape="true", following the stackexchange style. Before displaying the HTML I'm thinking to store sanitized input data to the database, so that it is ready to safe use with escape="true" and XSS is not a danger.
In order to achieve that, you basically need a standalone HTML parser. HTML parsing is rather complex and the task and responsibility of that is beyond the scope of JSF, PrimeFaces and OmniFaces. You're supposed to just grab one of the many existing HTML parsing libraries.
An example is Jsoup, it has even a separate method for the particular purpose of sanitizing HTML against a Safelist: Jsoup#clean(). For example, if you want to allow some basic HTML without images, use Safelist.basic():
String sanitizedHtml = Jsoup.clean(rawHtml, Safelist.basic());
A completely different alternative is to use a specific text formatting syntax, such as Markdown (which is also used here). Basically all of those parsers also sanitize HTML under the covers. An example is CommonMark. Perhaps this is what you actually meant when you said "stackexchange style".
As to saving in DB, you'd better save both the raw and parsed forms in 2 separate text columns. The raw form should be redisplayed during editing. The parsed form should be updated in background when the raw form has been edited. During display, obviously only show the parsed form with escape="false".
See also:
Markdown or HTML
I need a output text which works like h:outputText with escape="false" attribute, but doesn't let scripts to run. After a little search I found tr:outputFormatted makes that, but in our project we doesn't use trinidad. Is there something like outputFormatted in tomahawk, or in another taglib?
for example,
<h:outputText id="id" value="<b>test text</b><script type="text/javascipt">alert('I dont want these alert to show');</script>" escape="false"/>
that shows 'test text' bold but it popups the alert dialog too, I don't want the script to run. it can write script code or delete it but shouldn't run.
Use a HTML parser to get rid of those malicious things.
Among others, Jsoup is capable of this. Here's an extract of relevance from its site.
Sanitize untrusted HTML
Problem
You want to allow untrusted users to supply HTML for output on your website (e.g. as comment submission). You need to clean this HTML to avoid cross-site scripting (XSS) attacks.
Solution
Use the jsoup HTML Cleaner with a configuration specified by a Whitelist.
String unsafe =
"<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>";
String safe = Jsoup.clean(unsafe, Whitelist.basic());
// now: <p>Link</p>
So, all you basically need to do is the the following during preparing the text:
String sanitizedText = Jsoup.clean(rawText, Whitelist.basic());
(you can do it before or after saving the text in DB, but keep in mind that when doing it before without saving the original text, you can't detect malicious users and do social actions anymore)
and then display it as follows:
<h:outputText value="#{bean.sanitizedText}" escape="false" />
I have questions on preventing XSS attacks.
1) Question:
I have an HTML template as Javascript string (trusted) and insert content coming from a server request (untrusted). I replace placeholders within that HTML template strings with that untrusted content and output it to the DOM using innerHTML/Text.
In particular I insert texts that I output in <div> and <p> tags that are already present in the template HTML string and form element values, i.e. texts in input tag's value attribute, select option and textarea tags.
Do I understand correctly that I can treat every inserted text mentioned above as HTML subcontext thus I only encode like so: encodeForJavascript( encodeForHTML( inserted_text ) ). Or do I have to encode the texts that I insert into value attributes of the input fields for the HTML Attribute subcontext?
After reading up on this issue on OWASP I am inclined to think that latter is only necessary in case I set the attribute with unstrusted content via Javascript like so: document.forms[ 0 ].elements[ 0 ].value = encodeForHTMLAttribute, is that correct?
2) Question:
What is the added value of server side encoding server responses that enter the client side via Ajax and get handled anyway (like in question 1). In addition, don't we risk problems when double encoding the content?
Thanks
You need to encode for the context in question, so to data inserted into html context needs to be encoded for html, and data inserted into html attributes, should be html attribute encoded. This is addition to the javascript encoding you mentioned.
I would javascript encode for transfer and then encode for the correct context client side, where I know which context is the right one.
How to display value with HTML tag inside h:inputTextarea?
In DB I have column contain data, it contain plain text and HTML tag, I want display it on h:inputTextarea. How can I do it?
i want display HTML inside h:inputTextArea it mean in DB contain <br/> or <b> and </b> , when it display on h:inputTextArea it must display bold or break line.
That's not possible due to the nature of HTML <textarea> element. Even, if it was possible this puts doors wide open to XSS injection attacks. Also, how would you ever let the enduser edit the markup in the textarea like changing bold to for example italics or to add another markup? That's plain impossible with a <textarea>.
If your sole intent is to have a rich text editor, then you need to homegrow one with help of a <div> and a good shot of JavaScript or, better, use an existing JSF component which achieves this. For example, PrimeFaces' <p:editor>.
Or, if your sole intent is to display it only, then use <h:outputText> with the escape attribute set to false. Once again, keep XSS risks in mind.
Use the f:verbatim tag -> JSF Verbatim Tag
At what stage does my input in the textarea change from being this raw text, and become HTML?
For example, say I indent 4 spaces
like this
Then the WMD Showdown.js will render it properly below this textarea I type in. But the text area still literally contains
like this
So is PHP server side responsible for translating all the same things the showdown.js does to permanently be HTML in the SoF Database?
There are some other posts here about this, but basically it works like this. Or at least this is how I do it on my website using WMD; see my profile if you're interested in checking out my WMD implementation.
User enters the Markdown on the client, and showdown.js runs in real time in the browser (pure client-side JavaScript; no AJAX or anything like that) to give the user the preview.
Then when the user posts to the server, WMD sends the Markdown (you have to configure WMD to do this though; by default WMD sends HTML).
Run showdown.js server-side to convert the Markdown to HTML. In theory you could use some other method but it makes sense to try to get the same transformation on the server that the user sees on the client, other than any HTML tag filtering you want to do server-side.
As just noted, you'll need to do appropriate HTML tag filtering to avoid cross-site scripting (XSS) issues. This is both important and nontrivial, so be careful.
Save both the Markdown and the HTML in the database—the Markdown because if users want to edit their posts, you want to give them the Markdown, and the HTML so you don't have to transform Markdown to HTML every time you display answers.
Here are some related posts.
Convert HTML back to Markdown for editing in wmd: Tells how to configure WMD to send Markdown to the server instead of HTML.
What HTML tags are allowed on Stack Overflow?: Useful for thinking about HTML tag filtering.
Well first of all StackOverflow is built on ASP.NET, but yes essentially the characters in the rich text box gets translated back and forth.