Stackoverflows WMD System - Where does my input become HTML? - wmd

At what stage does my input in the textarea change from being this raw text, and become HTML?
For example, say I indent 4 spaces
like this
Then the WMD Showdown.js will render it properly below this textarea I type in. But the text area still literally contains
like this
So is PHP server side responsible for translating all the same things the showdown.js does to permanently be HTML in the SoF Database?

There are some other posts here about this, but basically it works like this. Or at least this is how I do it on my website using WMD; see my profile if you're interested in checking out my WMD implementation.
User enters the Markdown on the client, and showdown.js runs in real time in the browser (pure client-side JavaScript; no AJAX or anything like that) to give the user the preview.
Then when the user posts to the server, WMD sends the Markdown (you have to configure WMD to do this though; by default WMD sends HTML).
Run showdown.js server-side to convert the Markdown to HTML. In theory you could use some other method but it makes sense to try to get the same transformation on the server that the user sees on the client, other than any HTML tag filtering you want to do server-side.
As just noted, you'll need to do appropriate HTML tag filtering to avoid cross-site scripting (XSS) issues. This is both important and nontrivial, so be careful.
Save both the Markdown and the HTML in the database—the Markdown because if users want to edit their posts, you want to give them the Markdown, and the HTML so you don't have to transform Markdown to HTML every time you display answers.
Here are some related posts.
Convert HTML back to Markdown for editing in wmd: Tells how to configure WMD to send Markdown to the server instead of HTML.
What HTML tags are allowed on Stack Overflow?: Useful for thinking about HTML tag filtering.

Well first of all StackOverflow is built on ASP.NET, but yes essentially the characters in the rich text box gets translated back and forth.

Related

CKEditor inline with Node.js

I'm trying to create minimalistic content management system with ckeditor using node and express as a server. I would definitely want to implement the inline editing capabilities of ckeditor, but I'm having no success in sending the data to server and finally to nosql (mongodb) database.
I would like to have multiple inline editors within a page and to save to my database them simultaneously upon a POST event. I have my editor instances in invividual divs with an attribute contenteditable="true". Editor instances launch just fine, but when I'm trying to grab the data in my controller, all I have is an empty object. I can get the data from input fields, but then I lose the inline editing features. I've tried tinkering with bodyparser, but no success. All my divs containing the editable content lay under a HTML form element.
I would be more than happy is someone could at least point me to a general direction of how to accomplish this. Sorry if I was unable to make my self clear posting this question :)
tldr; How can I parse data from HTML elements, other than input-fields and text areas, in node/express with bodyparser?
Content of non-input fields won't be posted in a form, so you can't do that. A couple options come to mind:
Use JavaScript to update hidden inputs on the page as those divs change. Updated content will be posted.
Use JavaScript to make the POST, on save grab the contents, post them to the server, and then after that make the redirect from client side.

Allow only specific tag in <h:outputText> [duplicate]

Is there any HTML sanitizer or cleanup methods available in any JSF utilities kit or libraries like PrimeFaces/OmniFaces?
I need to sanitize HTML input by user via p:editor and display safe HTML output using escape="true", following the stackexchange style. Before displaying the HTML I'm thinking to store sanitized input data to the database, so that it is ready to safe use with escape="true" and XSS is not a danger.
In order to achieve that, you basically need a standalone HTML parser. HTML parsing is rather complex and the task and responsibility of that is beyond the scope of JSF, PrimeFaces and OmniFaces. You're supposed to just grab one of the many existing HTML parsing libraries.
An example is Jsoup, it has even a separate method for the particular purpose of sanitizing HTML against a Safelist: Jsoup#clean(). For example, if you want to allow some basic HTML without images, use Safelist.basic():
String sanitizedHtml = Jsoup.clean(rawHtml, Safelist.basic());
A completely different alternative is to use a specific text formatting syntax, such as Markdown (which is also used here). Basically all of those parsers also sanitize HTML under the covers. An example is CommonMark. Perhaps this is what you actually meant when you said "stackexchange style".
As to saving in DB, you'd better save both the raw and parsed forms in 2 separate text columns. The raw form should be redisplayed during editing. The parsed form should be updated in background when the raw form has been edited. During display, obviously only show the parsed form with escape="false".
See also:
Markdown or HTML

Embed HTML in JSF [duplicate]

Is there any HTML sanitizer or cleanup methods available in any JSF utilities kit or libraries like PrimeFaces/OmniFaces?
I need to sanitize HTML input by user via p:editor and display safe HTML output using escape="true", following the stackexchange style. Before displaying the HTML I'm thinking to store sanitized input data to the database, so that it is ready to safe use with escape="true" and XSS is not a danger.
In order to achieve that, you basically need a standalone HTML parser. HTML parsing is rather complex and the task and responsibility of that is beyond the scope of JSF, PrimeFaces and OmniFaces. You're supposed to just grab one of the many existing HTML parsing libraries.
An example is Jsoup, it has even a separate method for the particular purpose of sanitizing HTML against a Safelist: Jsoup#clean(). For example, if you want to allow some basic HTML without images, use Safelist.basic():
String sanitizedHtml = Jsoup.clean(rawHtml, Safelist.basic());
A completely different alternative is to use a specific text formatting syntax, such as Markdown (which is also used here). Basically all of those parsers also sanitize HTML under the covers. An example is CommonMark. Perhaps this is what you actually meant when you said "stackexchange style".
As to saving in DB, you'd better save both the raw and parsed forms in 2 separate text columns. The raw form should be redisplayed during editing. The parsed form should be updated in background when the raw form has been edited. During display, obviously only show the parsed form with escape="false".
See also:
Markdown or HTML

Text is writing code

I want to add on my site a text who is writing, as in http://www.whoismrrobot.com/
I use the code <Marquee>Text Here</Marquee> but this code only rolls the text.
The effect on your example site is not achieved with the marquee tag, and it can't be done this way. I guess the site is using some combination of JavaScript, CSS and maybe server side scripting.
Generally using the marquee tag is a bad idea, as it is some remnant of the "old times", when format and content both were provided by the html page itself. Even back then the marquee tag wasn't part of the html-specification and not supported by every browser.
For further details see this description of the tag.

Why does React.js' API warn against inserting raw HTML?

From the tutorial
But there's a problem! Our rendered comments look like this in the
browser: "<p>This is <em>another</em> comment</p>". We want those tags
to actually render as HTML.
That's React protecting you from an XSS attack. There's a way to get
around it but the framework warns you not to use it:
...
<span dangerouslySetInnerHTML={{__html: rawMarkup}} />
This is a special API that intentionally makes it difficult to insert raw HTML, but for Showdown we'll take advantage of this backdoor.
Remember: by using this feature you're relying on Showdown to be secure.
So there exists an API for inserting raw HTML, but the method name and the docs all warn against it. Is it safe to use this? For example, I have a chat app that takes Markdown comments and converts them to HTML strings. The HTML snippets are generated on the server by a Markdown converter. I trust the converter, but I'm not sure if there's any way for a user to carefully craft Markdown to exploit XSS. Is there anything else I should be doing to make sure this is safe?
Most Markdown processors (and I believe Showdown as well) allow the writer to use inline HTML. For example a user might enter:
This is _markdown_ with a <marquee>ghost from the past</marquee>. Or even **worse**:
<script>
alert("spam");
</script>
As such, you should have a whitelist of tags and strip all the other tags after converting from markdown to html. Only then use the aptly named dangerouslySetInnerHTML.
Note that this also what Stackoverflow does. The above Markdown renders as follows (without you getting an alert thrown in your face):
This is markdown with a ghost from the past. Or
even worse:
alert("spam");
There are three reasons it's best to avoid html:
security risks (xss, etc)
performance
event listeners
The security risks are largely mitigated by markdown, but you still have to decide what you consider valid, and ensure it's disallowed (maybe you don't allow images, for example).
The performance issue is only relevant when something will change in the markup. For example if you generated html with this: "Time: <b>" + new Date() + "</b>". React would normally decide to only update the textContent of the <b/> element, but instead replaces everything, and the browser must reparse the html. In larger chunks of html, this is more of a problem.
If you did want to know when someone clicks a link in the results, you've lost the ability to do so simply. You'd need to add an onClick listener to the closest react node, and figure out which element was clicked, delegating actions from there.
If you would like to use Markdown in React, I recommend a pure react renderer, e.g. vjeux/markdown-react.

Resources