Node.js Tinymce storing rich text - node.js

I've got a form with Tinymce editor so any user can format it's text to look good.
If he writes any malicious script in the form, it is automatically escaped. But if the user cheats and posts the form using something like Postman, he can submit unescaped scripts (like iframes).
How do I validate the Tinymce input? If I use the validator plugin with the "escape" function, it removes all formatting. I tried to use some Google Caja plugins for node to sanitize the input but it's not removing any malicious code, like iframes. Any help?

I've found a pretty good Node.js module to sanitize html input.
It's called Sanitize-html and it does exactly what I want, removes dangerous html tags from the input string and you can add/remove specific tags

Related

NodeJS Jade (Pug) inline link in dynamic text

I have this NodeJS application, that uses Jade as template language. On one particular page, one text block is retrieved from the server, which reads the text from database.
The problem is, the returned text might contain line-breaks and links, and an operator might change this text at any time. How do I make these elements display correctly?
Most answers suggest using a new line:
p
| this is the start of the para
a(href='http://example.com') a link
| and this is the rest of the paragraph
But I cannot do this, since I cannot know when the a element appears. I've solved how to get newline correct, by this trick:
p
each l in line.description.split(/\n/)
= l
br
But I cannot seem to solve how to get links to render correctly. Does anyone know?
Edit:
I am open to any kind of format for links in the database, whatever would solve the issue. For example, say database contains the following text:
Hello!
We would like you to visit [a("http://www.google.com")Google]
Then we would like that to output text that looks like this:
Hello!
We would like you to visit Google
Looks like what you're looking for is unescaped string interpolation. The link does not work in the output because Pug automatically escapes it. Wrap the content you want to insert with !{} and it should stop breaking links. (Disclaimer: Make sure you don't leave user input unescaped - this only is a viable option if you know for sure the content of your DB does not have unwanted HTML/JS code in it.)
See this CodePen for illustration.
With this approach, you would need to use standard HTML tags (<a>) in your DB text. If you don't want that, you could have a look at Pug filters such as markdown-it (you will still need to un-escape the compilation output of that filter).

How to get an HTML tag as 2 strings (opening tag, closing tag), without its contents from kuchiki?

I am writing an HTML to Markdown converter in Rust, using Kuchiki to get access to the parsed tree from html5ever.
For unknown HTML tags, I want to provide the possibility to ignore them and pass them through to the output string, but still processing their children as normal. For that, I need the textual representation of the tag without its contents, but I can't figure how best to do that.
The best I can come up with is:
Clone the node
Drop its children
Call node.to_string
"parse" the string with a regular expression to separate the opening and closing tags.
I feel there must be a better way. I don't think Kuchiki provides this functionality out of the box, but I also don't know how to get access to the html5ever API through Kuchiki, and I also don't get from the html5ever API documentation whether they would provide some functionality like this.

How to index html, css, and javascript using solr

Users of CodePen submit html/css/javascript each time they save a pen. We're setting up Solr search and I'd like to know if any work has been done to properly tokenize html/css/js for optimal retrieval.
For example in javascript, we'd like code like
window.location = 'http://wufoo.com'
to produce a search hit on window, location and window.location.
Also, for html, we don't wish to strip out brackets on elements like <form> or <field>.
Before I go down the road of writing a custom field type, I'd like to know if anyone has already tackled this problem. Since we index each field individually, we'll need a separate tokenizer with rules specific for css, html, and javascript.

How to add text to any html element?

I want to add text to body element but I don't know how. Which method will work on the body tag?
Sorry for my english and thanks for replies.
In Watir, you can manipulate a web page (DOM) using JS, just like that:
browser.execute_script("document.getElementById('pageContent').appendChild(document.createTextNode('Great Success!'));")
I assume that the point of the question is:
All users are not just interacting by just clicking buttons and links on the web app, some of them are doing nasty things like altering http requests to make your system do something that it is not supposed to do... or to just have some fun.
To mimic this behavior, you could write a ui-test that alters forms on the web page, so that for example, one could type in anything into any field instead of a limited dropdown.
To do that, ui test has to:
manipulate DOM to set form inputs free of limitations (replace select's with input's, etc.)
ui test has to know, which values to use, in many cases it's pointless to enter random values. Your webapp has to provide some good "unwanted" options.
Why would you want to modify the webpage in Watir? It's for automated testing, not DOM manipulation.
If you want to add something to the DOM element in javascript, you can do it like that:
var txt = document.createTextNode(" This text was added to the DIV.");
document.getElementById('myDiv').appendChild(txt);
Or use some DOM manipulation library, like jQuery.
If you have not worked your way though the watir tutorial, I would suggest you do so. It deals with things like filling in text fields etc.
Learn to use the developer tools for your browser, Firebug for Firefox, or the built in tools for IE and CHrome. They will let you look at things as you interact with the site.
If the element is not a normal HTML input field of some sort, then you are dealing with a custom control. Many exist and they are varied and there is no one set solution for dealing with them. Without knowing which control you are using, and being able ourselves to interact with a sample of it, or at least see the HTML, it is very very difficult to advise you, we basically have to just guess (which is often a waste of everyone's time)
Odds are if you have a place you can enter text, then it is some form of input control, it might not start out that way, you may need to click on some other element, to make the input area appear, but without a sample of HTML all we can do is guess.
If this is a commercial control, see if you can find a demo site that shows the control in action. Try googling things like class names for the elements and often you get lucky

What bad things can a user do in a browser without the script tag?

I have an entry form where the user can type arbitrary HTML. What do I need to filter out besides script tags? Here's what I do:
userInput.replace(/<(script)/gi, "<$1");
but the sanitizer of WMD (used here on SO) manages a white list of tags, and filters out (blanks) all other tags. Why?
I don't like white lists because I don't want to prevent the user from entering arbitrary tags if she so chooses; but I can use a more extensive black list, besides 'script', if needed. What do I need as a black list?
Short answer: anything they can do with the script tag.
The script tag is not required to run javascript. Script can also be placed in almost every HTML tag. Script can appear in a number of places additional to the script tag including, but not limited to, src and href attributes that are used for URLs, event handlers and the style attribute.
The ability for a user to put unwanted script into your page is a security vulnerability known as cross-site scripting. Read around this topic and read the XSS prevention cheat sheet.
You may not want to let users add HTML to your pages. If you need this feature, consider other formats such as Markdown that allows you to disable the use of any embedded HTML; or another less secure option is to use a filtering library that tries to remove all script, such as HTMLPurifier. If you choose the filtering option, be sure to subscribe to announcements of new releases and always go back to your project to install the bug-fixed releases of the filter as new exploits are found and worked-around.

Resources