in my application i have some tinymce editors and the userinput is shown with
<h:outputText escape="false"/>
but how can i prevent malicious input, like javascript or iframes? Is there any lib which can filter the input strings?
UPDATE:
i found "htmlpurifier" but it is for php, is there anyting like this for java?
You'd need to use a HTML parser which supports cleaning/whitelisting tags/attributes. Among them there's Jsoup, it has a clean() method for exactly this purpose. Here's an extract of relevance from its site.
Sanitize untrusted HTML
Problem
You want to allow untrusted users to supply HTML for output on your website (e.g. as comment submission). You need to clean this HTML to avoid cross-site scripting (XSS) attacks.
Solution
Use the jsoup HTML Cleaner with a configuration specified by a Whitelist.
String unsafe =
"<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>";
String safe = Jsoup.clean(unsafe, Whitelist.basic());
// now: <p>Link</p>
Related
Is there any HTML sanitizer or cleanup methods available in any JSF utilities kit or libraries like PrimeFaces/OmniFaces?
I need to sanitize HTML input by user via p:editor and display safe HTML output using escape="true", following the stackexchange style. Before displaying the HTML I'm thinking to store sanitized input data to the database, so that it is ready to safe use with escape="true" and XSS is not a danger.
In order to achieve that, you basically need a standalone HTML parser. HTML parsing is rather complex and the task and responsibility of that is beyond the scope of JSF, PrimeFaces and OmniFaces. You're supposed to just grab one of the many existing HTML parsing libraries.
An example is Jsoup, it has even a separate method for the particular purpose of sanitizing HTML against a Safelist: Jsoup#clean(). For example, if you want to allow some basic HTML without images, use Safelist.basic():
String sanitizedHtml = Jsoup.clean(rawHtml, Safelist.basic());
A completely different alternative is to use a specific text formatting syntax, such as Markdown (which is also used here). Basically all of those parsers also sanitize HTML under the covers. An example is CommonMark. Perhaps this is what you actually meant when you said "stackexchange style".
As to saving in DB, you'd better save both the raw and parsed forms in 2 separate text columns. The raw form should be redisplayed during editing. The parsed form should be updated in background when the raw form has been edited. During display, obviously only show the parsed form with escape="false".
See also:
Markdown or HTML
Is there any HTML sanitizer or cleanup methods available in any JSF utilities kit or libraries like PrimeFaces/OmniFaces?
I need to sanitize HTML input by user via p:editor and display safe HTML output using escape="true", following the stackexchange style. Before displaying the HTML I'm thinking to store sanitized input data to the database, so that it is ready to safe use with escape="true" and XSS is not a danger.
In order to achieve that, you basically need a standalone HTML parser. HTML parsing is rather complex and the task and responsibility of that is beyond the scope of JSF, PrimeFaces and OmniFaces. You're supposed to just grab one of the many existing HTML parsing libraries.
An example is Jsoup, it has even a separate method for the particular purpose of sanitizing HTML against a Safelist: Jsoup#clean(). For example, if you want to allow some basic HTML without images, use Safelist.basic():
String sanitizedHtml = Jsoup.clean(rawHtml, Safelist.basic());
A completely different alternative is to use a specific text formatting syntax, such as Markdown (which is also used here). Basically all of those parsers also sanitize HTML under the covers. An example is CommonMark. Perhaps this is what you actually meant when you said "stackexchange style".
As to saving in DB, you'd better save both the raw and parsed forms in 2 separate text columns. The raw form should be redisplayed during editing. The parsed form should be updated in background when the raw form has been edited. During display, obviously only show the parsed form with escape="false".
See also:
Markdown or HTML
From the tutorial
But there's a problem! Our rendered comments look like this in the
browser: "<p>This is <em>another</em> comment</p>". We want those tags
to actually render as HTML.
That's React protecting you from an XSS attack. There's a way to get
around it but the framework warns you not to use it:
...
<span dangerouslySetInnerHTML={{__html: rawMarkup}} />
This is a special API that intentionally makes it difficult to insert raw HTML, but for Showdown we'll take advantage of this backdoor.
Remember: by using this feature you're relying on Showdown to be secure.
So there exists an API for inserting raw HTML, but the method name and the docs all warn against it. Is it safe to use this? For example, I have a chat app that takes Markdown comments and converts them to HTML strings. The HTML snippets are generated on the server by a Markdown converter. I trust the converter, but I'm not sure if there's any way for a user to carefully craft Markdown to exploit XSS. Is there anything else I should be doing to make sure this is safe?
Most Markdown processors (and I believe Showdown as well) allow the writer to use inline HTML. For example a user might enter:
This is _markdown_ with a <marquee>ghost from the past</marquee>. Or even **worse**:
<script>
alert("spam");
</script>
As such, you should have a whitelist of tags and strip all the other tags after converting from markdown to html. Only then use the aptly named dangerouslySetInnerHTML.
Note that this also what Stackoverflow does. The above Markdown renders as follows (without you getting an alert thrown in your face):
This is markdown with a ghost from the past. Or
even worse:
alert("spam");
There are three reasons it's best to avoid html:
security risks (xss, etc)
performance
event listeners
The security risks are largely mitigated by markdown, but you still have to decide what you consider valid, and ensure it's disallowed (maybe you don't allow images, for example).
The performance issue is only relevant when something will change in the markup. For example if you generated html with this: "Time: <b>" + new Date() + "</b>". React would normally decide to only update the textContent of the <b/> element, but instead replaces everything, and the browser must reparse the html. In larger chunks of html, this is more of a problem.
If you did want to know when someone clicks a link in the results, you've lost the ability to do so simply. You'd need to add an onClick listener to the closest react node, and figure out which element was clicked, delegating actions from there.
If you would like to use Markdown in React, I recommend a pure react renderer, e.g. vjeux/markdown-react.
I need a output text which works like h:outputText with escape="false" attribute, but doesn't let scripts to run. After a little search I found tr:outputFormatted makes that, but in our project we doesn't use trinidad. Is there something like outputFormatted in tomahawk, or in another taglib?
for example,
<h:outputText id="id" value="<b>test text</b><script type="text/javascipt">alert('I dont want these alert to show');</script>" escape="false"/>
that shows 'test text' bold but it popups the alert dialog too, I don't want the script to run. it can write script code or delete it but shouldn't run.
Use a HTML parser to get rid of those malicious things.
Among others, Jsoup is capable of this. Here's an extract of relevance from its site.
Sanitize untrusted HTML
Problem
You want to allow untrusted users to supply HTML for output on your website (e.g. as comment submission). You need to clean this HTML to avoid cross-site scripting (XSS) attacks.
Solution
Use the jsoup HTML Cleaner with a configuration specified by a Whitelist.
String unsafe =
"<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>";
String safe = Jsoup.clean(unsafe, Whitelist.basic());
// now: <p>Link</p>
So, all you basically need to do is the the following during preparing the text:
String sanitizedText = Jsoup.clean(rawText, Whitelist.basic());
(you can do it before or after saving the text in DB, but keep in mind that when doing it before without saving the original text, you can't detect malicious users and do social actions anymore)
and then display it as follows:
<h:outputText value="#{bean.sanitizedText}" escape="false" />
Using Microsoft's AntiXssLibrary, how do you handle input that needs to be edited later?
For example:
User enters:
<i>title</i>
Saved to the database as:
<i>title</i>
On an edit page, in a text box it displays something like:
<i>title</i> because I've encoded it before displaying in the text box.
User doesn't like that.
Is it ok not to encode when writing to an input control?
Update:
I'm still trying to figure this out. The answers below seem to say to decode the string before displaying, but wouldn't that allow for XSS attacks?
The one user who said that decoding the string in an input field value is ok was downvoted.
Looks like you're encoding it more than once. In ASP.NET, using Microsoft's AntiXss Library you can use the HtmlAttributeEncode method to encode untrusted input:
<input type="text" value="<%= AntiXss.HtmlAttributeEncode("<i>title</i>") %>" />
This results in
<input type="text" value="<i>title</i>" /> in the rendered page's markup and is correctly displayed as <i>title</i> in the input box.
Your problem appears to be double-encoding; the HTML needs to be escaped once (so it can be inserted into the HTML on the page without issue), but twice leads to the encoded version appearing literally.
You can call HTTPUtility.HTMLDecode(MyString) to get the text back to the unencoded form.
If you are allowing users to enter HTML that will then be rendered on the site, you need to do more than just Encode and Decode it.
Using AntiXss prevents attacks by converting script and markup to text. It does not do anything to "clean" markup that will be rendered directly. You're going to have to manually remove script tags, etc. from the user's input to be fully protected in that scenario.
You'll need to strip out script tags as well as JavaScript attributes on legal elements. For example, an attacker could inject malicious code into the onclick or onmouseover attributes.
Yes, the code inside input boxes is safe from scripting attacks and does not need to be encoded.