Why does React.js' API warn against inserting raw HTML? - security

From the tutorial
But there's a problem! Our rendered comments look like this in the
browser: "<p>This is <em>another</em> comment</p>". We want those tags
to actually render as HTML.
That's React protecting you from an XSS attack. There's a way to get
around it but the framework warns you not to use it:
...
<span dangerouslySetInnerHTML={{__html: rawMarkup}} />
This is a special API that intentionally makes it difficult to insert raw HTML, but for Showdown we'll take advantage of this backdoor.
Remember: by using this feature you're relying on Showdown to be secure.
So there exists an API for inserting raw HTML, but the method name and the docs all warn against it. Is it safe to use this? For example, I have a chat app that takes Markdown comments and converts them to HTML strings. The HTML snippets are generated on the server by a Markdown converter. I trust the converter, but I'm not sure if there's any way for a user to carefully craft Markdown to exploit XSS. Is there anything else I should be doing to make sure this is safe?

Most Markdown processors (and I believe Showdown as well) allow the writer to use inline HTML. For example a user might enter:
This is _markdown_ with a <marquee>ghost from the past</marquee>. Or even **worse**:
<script>
alert("spam");
</script>
As such, you should have a whitelist of tags and strip all the other tags after converting from markdown to html. Only then use the aptly named dangerouslySetInnerHTML.
Note that this also what Stackoverflow does. The above Markdown renders as follows (without you getting an alert thrown in your face):
This is markdown with a ghost from the past. Or
even worse:
alert("spam");

There are three reasons it's best to avoid html:
security risks (xss, etc)
performance
event listeners
The security risks are largely mitigated by markdown, but you still have to decide what you consider valid, and ensure it's disallowed (maybe you don't allow images, for example).
The performance issue is only relevant when something will change in the markup. For example if you generated html with this: "Time: <b>" + new Date() + "</b>". React would normally decide to only update the textContent of the <b/> element, but instead replaces everything, and the browser must reparse the html. In larger chunks of html, this is more of a problem.
If you did want to know when someone clicks a link in the results, you've lost the ability to do so simply. You'd need to add an onClick listener to the closest react node, and figure out which element was clicked, delegating actions from there.
If you would like to use Markdown in React, I recommend a pure react renderer, e.g. vjeux/markdown-react.

Related

Why do some webpages use a little HTML, but so much JavaScript

Why do some webpages use a little HTML, but so much JavaScript? Someone even used JavaScript to append the HTML like: <div> <span> and so on; the HTML source code was hidden in JavaScript.
Like this:
The modern web is growing greatly in terms of complexity and user expectations.
Users want more complex, rich content, delivered faster. JavaScript provides the capability to enable meeting this demand.
HTML by definition, (HyperText Markup Language) provides zero functionality to a website. It simply displays stuff. And CSS (Cascading Style Sheets), in particular CSS3, provides a minimal level functionality (like import -ing a font for example, or making a div spin or fade in and out)
For anything and everything else, you gotta use JavaScript. From controlling HTML, CSS, retrieving data from APIs (Application Progamming Interface) and handling user input, JavaScript does it all.

In Chrome extensions, why use a background page with HTML?

I understand that the background page of a Chrome extension is never displayed. It makes sense to me that a background page should contain only scripts. In what situations would HTML markup ever be needed?
At https://developer.chrome.com/extensions/background_pages there is an example with an HTML background page, but I haven't been able to get it to work (perhaps because I am not sure what it should be doing).
Are there any examples of simple Chrome extensions which demonstrate how HTML markup can be useful in a background page?
Historical reasons
The background page is, technically, a whole separate document - except it's not rendered in an actual tab.
For simplicity's sake, perhaps, extensions started with requiring a full HTML page for the background page through the background_page manifest property. That was the only form.
But, as evidenced by your question, most of the time it's not clear what the page can actually be used for except for holding scripts. That made the entire thing being just a piece of boilerplate.
That's why when Chrome introduced "manifest_version": 2 in 2012 as a big facelift to extensions, they added an alternative format, background.scripts array. This will offload the boilerplate to Chrome, which will then create a background page document for you, succinctly called _generated_background_page.html.
Today, this is a preferred method, though background.page is still available.
Practical reasons
With all the above said, you still sometimes want to have actual elements in your background page's document.
<script> for dynamically adding scripts to the background page (as long as they conform to extension CSP).
Among other things, since you can't include external scripts through background.scripts array, you need to create a <script> element for those you whitelist for the purpose.
<canvas> for preparing image data for use elsewhere, for example in Browser Action icons.
<audio> for producing sounds.
<textarea> for (old-school) working with clipboard (don't actually do this).
<iframe> for embedding an external page into the background page, which can sometimes help extracting dynamic data.
..possibly more.
It's debatable which boilerplate is "better": creating the elements in advance as a document, or using document.createElement and its friends as needed.
In any case, a background page is always a page, whether provided by you or autogenerated by Chrome. You can use all the DOM functions you want.
My two cents:
Take Google Mail Checker as an example, it declares a canvas in background.html
<canvas id="canvas" width="19" height="19">
Then it could manipulate the canvas in background.js and call chrome.browserAction.setIcon({imageData: canvasContext.getImageData(...)}) to change the browser action icon.
I know we could dynamically create canvas via background.js, however when doing something involving DOM element, using html directly seems easier.

Allow only specific tag in <h:outputText> [duplicate]

Is there any HTML sanitizer or cleanup methods available in any JSF utilities kit or libraries like PrimeFaces/OmniFaces?
I need to sanitize HTML input by user via p:editor and display safe HTML output using escape="true", following the stackexchange style. Before displaying the HTML I'm thinking to store sanitized input data to the database, so that it is ready to safe use with escape="true" and XSS is not a danger.
In order to achieve that, you basically need a standalone HTML parser. HTML parsing is rather complex and the task and responsibility of that is beyond the scope of JSF, PrimeFaces and OmniFaces. You're supposed to just grab one of the many existing HTML parsing libraries.
An example is Jsoup, it has even a separate method for the particular purpose of sanitizing HTML against a Safelist: Jsoup#clean(). For example, if you want to allow some basic HTML without images, use Safelist.basic():
String sanitizedHtml = Jsoup.clean(rawHtml, Safelist.basic());
A completely different alternative is to use a specific text formatting syntax, such as Markdown (which is also used here). Basically all of those parsers also sanitize HTML under the covers. An example is CommonMark. Perhaps this is what you actually meant when you said "stackexchange style".
As to saving in DB, you'd better save both the raw and parsed forms in 2 separate text columns. The raw form should be redisplayed during editing. The parsed form should be updated in background when the raw form has been edited. During display, obviously only show the parsed form with escape="false".
See also:
Markdown or HTML

Embed HTML in JSF [duplicate]

Is there any HTML sanitizer or cleanup methods available in any JSF utilities kit or libraries like PrimeFaces/OmniFaces?
I need to sanitize HTML input by user via p:editor and display safe HTML output using escape="true", following the stackexchange style. Before displaying the HTML I'm thinking to store sanitized input data to the database, so that it is ready to safe use with escape="true" and XSS is not a danger.
In order to achieve that, you basically need a standalone HTML parser. HTML parsing is rather complex and the task and responsibility of that is beyond the scope of JSF, PrimeFaces and OmniFaces. You're supposed to just grab one of the many existing HTML parsing libraries.
An example is Jsoup, it has even a separate method for the particular purpose of sanitizing HTML against a Safelist: Jsoup#clean(). For example, if you want to allow some basic HTML without images, use Safelist.basic():
String sanitizedHtml = Jsoup.clean(rawHtml, Safelist.basic());
A completely different alternative is to use a specific text formatting syntax, such as Markdown (which is also used here). Basically all of those parsers also sanitize HTML under the covers. An example is CommonMark. Perhaps this is what you actually meant when you said "stackexchange style".
As to saving in DB, you'd better save both the raw and parsed forms in 2 separate text columns. The raw form should be redisplayed during editing. The parsed form should be updated in background when the raw form has been edited. During display, obviously only show the parsed form with escape="false".
See also:
Markdown or HTML

Stackoverflows WMD System - Where does my input become HTML?

At what stage does my input in the textarea change from being this raw text, and become HTML?
For example, say I indent 4 spaces
like this
Then the WMD Showdown.js will render it properly below this textarea I type in. But the text area still literally contains
like this
So is PHP server side responsible for translating all the same things the showdown.js does to permanently be HTML in the SoF Database?
There are some other posts here about this, but basically it works like this. Or at least this is how I do it on my website using WMD; see my profile if you're interested in checking out my WMD implementation.
User enters the Markdown on the client, and showdown.js runs in real time in the browser (pure client-side JavaScript; no AJAX or anything like that) to give the user the preview.
Then when the user posts to the server, WMD sends the Markdown (you have to configure WMD to do this though; by default WMD sends HTML).
Run showdown.js server-side to convert the Markdown to HTML. In theory you could use some other method but it makes sense to try to get the same transformation on the server that the user sees on the client, other than any HTML tag filtering you want to do server-side.
As just noted, you'll need to do appropriate HTML tag filtering to avoid cross-site scripting (XSS) issues. This is both important and nontrivial, so be careful.
Save both the Markdown and the HTML in the database—the Markdown because if users want to edit their posts, you want to give them the Markdown, and the HTML so you don't have to transform Markdown to HTML every time you display answers.
Here are some related posts.
Convert HTML back to Markdown for editing in wmd: Tells how to configure WMD to send Markdown to the server instead of HTML.
What HTML tags are allowed on Stack Overflow?: Useful for thinking about HTML tag filtering.
Well first of all StackOverflow is built on ASP.NET, but yes essentially the characters in the rich text box gets translated back and forth.

Resources