How to check data againts xss in nodejs? - node.js

I use nodejs as my backend and store my data in MongoDB. I'm interested how should I check incomming data before saving into database.
I need to check as pure strings like:
"some xss test"
and object of strings:
{
"name": "xss name",
"age": 25
}
What library should I use for my task?

It is a general practice to verify the data when you are outputting it, not storing. Doing so, you do not need to worry, what if the XSS data got into database using other routes?
But your question still stands, how would a programmer check if something contains XSS or not. There is a validator module exactly for doing this job:
var validator = require('validator');
var escaped_string = validator.escape(someString);
To verify the object of strings, you might have to iterate manually through the list.
If you are actually intersted in outputting html code, but worry for XSS, then you need to use a more sophisticated XSS validator which is kept up-to-date. Example would be Google Caja

Related

How to protect against mongo like SQL injection?

I've been reading up on ways to protect against 'mongo injection' in my express backend. I typically use the package express-validator to sanitize and validate my inputs.
For example:
{"username": {"$ne": null}} //possible injection
.check('username').isString() //simple validation
Do you just verify the value is a string?
From what I understand the solution is to make sure "$" are not allowed or removed in the keys of JSON payloads passed to my endpoints. How do others protect against this? Is there a way to go through the whole POSTed payload and remove any key value pairs that contain a "$" in the key? Just curious if I'm missing something obvious.

What is the best way to safely read user input?

Let's consider a REST endpoint which receives a JSON object. One of the JSON fields is a String, so I want to validate that no malicious text is received.
#ValidateRequest
public interface RestService {
#POST
#Consumes(APPLICATION_JSON)
#Path("endpoint")
void postData (#Valid #NotNull Data data);
}
public class Data {
#ValidString
private String s;
// get,set methods
}
I'm using the bean validation framework via #ValidString to delegate the validation to the ESAPI library.
#Override
public boolean isValid (String value, ConstraintValidatorContext context) {
return ESAPI.validator().isValidInput(
"String validation",
value,
this.constraint.type(),
this.constraint.maxLength(),
this.constraint.nullable(),
true);
}
This method canonicalizes the value (i.e. removes encryption) and then validates against a regular expression provided in the ESAPI config. The regex is not that important to the question, but it mostly whitelists 'safe' characters.
All good so far. However, in a few occasions, I need to accept 'less' safe characters like %, ", <, >, etc. because the incoming text is from an end user's free text input field.
Is there a known pattern for this kind of String sanitization? What kind of text can cause server-side problems if SQL queries are considered safe (e.g. using bind variables)? What if the user wants to store <script>alert("Hello")</script> as his description which at some point will be send back to the client? Do I store that in the DB? Is that a client-side concern?
When dealing with text coming from the user, best practice is to white list only known character sets as you stated. But that is not the whole solution, since there are times when that will not work, again as you pointed out sometimes "dangerous" characters are part of the valid character set.
When this happens you need to be very vigilant in how you handle the data. I, as well as the commenters, recommended is to keep the original data from the user in its original state as long as possible. Dealing with the data securely will be to use proper functions for the target domain/output.
SQL
When putting free format strings into a SQL database, best practice is to use prepared statements (in java this is the PreparedStatement object or using ORM that will automatically parameterizes the data.
To read more on SQL injection attacks and other forms of Injection attacks (XML, LDAP, etc.) I recommended OWASPS Top 10 - A1 Injections
XSS
You also mentioned what to do when outputting this data to client. In this case I you want to make sure you html encode the output for the proper context, aka contextual output encoding. ESAPI has Encoder Class/Interface for this. The important thing to note is which context (HTML Body, HTML Attribute, JavaScript, URL, etc.) will the data be outputted. Each area is going to be encoded differently.
Take for example the input: <script>alert('Hello World');<script>
Sample Encoding Outputs:
HTML: <script>alert('Hello World');<script>
JavaScript: \u003cscript\u003ealert(\u0027Hello World\u0027);\u003cscript\u003e
URL: %3Cscript%3Ealert%28%27Hello%20World%27%29%3B%3Cscript%3E
Form URL:
%3Cscript%3Ealert%28%27Hello+World%27%29%3B%3Cscript%3E
CSS: \00003Cscript\00003Ealert\000028\000027Hello\000020World\000027\000029\00003B\00003Cscript\00003E
XML: <script>alert(&apos;Hello World&apos;);<script>
For more reading on XSS look at OWASP Top 10 - A3 Cross-Site Scripting (XSS)

Best practice to pass query conditions in ajax request

I'm writing a REST api in node js that will execute a sql query and send the results;
in the request I need to send the WHERE conditions; ex:
GET 127.0.0.1:5007/users //gets the list of users
GET 127.0.0.1:5007/users
id = 1 //gets the user with id 1
Right now the conditions are passed from the client to the rest api in the request's headers.
In the API I'm using sequelize, an ORM that needs to receive WHERE conditions in a particular form (an object); ex: having the condition:
(x=1 AND (y=2 OR z=3)) OR (x=3 AND y=1)
this needs to be formatted as a nested object:
-- x=1
-- AND -| -- y=2
| -- OR ----|
| -- z=3
-- OR -|
|
| -- x=3
-- AND -|
-- y=1
so the object would be:
Sequelize.or (
Sequelize.and (
{x=1},
Sequelize.or(
{y=2},
{z=3}
)
),
Sequelize.and (
{x=3},
{y=1}
)
)
Now I'm trying to pass a simple string (like "(x=1 AND (y=2 OR z=3)) OR (x=3 AND y=1)"), but then I will need a function on the server that can convert the string in the needed object (this method in my opinion has the advantage that the developer writing the client, can pass the where conditions in a simple way, like using sql, and this method is also indipendent from the used ORM, with no need to change the client if we need to change the server or use a different ORM);
The function to read and convert the conditions' string into an object is giving me headache (I'm trying to write one without success, so if you have some examples about how to do something like this...)
What I would like to get is a route capable of executing almost any kind of sql query and give the results:
now I have a different route for everything:
127.0.0.1:5007/users //to get all users
127.0.0.1:5007/users/1 //to get a single user
127.0.0.1:5007/lastusers //to get user registered in the last month
and so on for the other tables i need to query (one route for every kind of request I need in the client);
instead I would like to have only one route, something like:
127.0.0.1:5007/request
(when calling this route I will pass the table name and the conditions' string)
Do you think this solution would be a good solution or you generally use other ways to handle this kind of things?
Do you have any idea on how to write a function to convert the conditions' string into the desired object?
Any suggestion would be appreciated ;)
I would strongly advise you not to expose any part of your database model to your clients. Doing so means you can't change anything you expose without the risk of breaking the clients. One suggestion as far as what you've supplied is that you can and should use query parameters to cut down on the number of endpoints you've got.
GET /users //to get all users
GET /users?registeredInPastDays=30 //to get user registered in the last month
GET /users/1 //to get a single user
Obviously "registeredInPastDays" should be renamed to something less clumsy .. it's just an example.
As far as the conditions string, there ought to be plenty of parsers available online. The grammar looks very straightforward.
IMHO the main disadvantage of your solution is that you are creating just another API for quering data. Why create sthm from scratch if it is already created? You should use existing mature query API and focus on your business logic rather then inventing sthm new.
For example, you can take query syntax from Odata. Many people have been developing that standard for a long time. They have already considered different use cases and obstacles for query API.
Resources are located with a URI. You can use or mix three ways to address them:
Hierarchically with a sequence of path segments:
/users/john/posts/4711
Non hierarchically with query parameters:
/users/john/posts?minVotes=10&minViews=1000&tags=java
With matrix parameters which affect only one path segment:
/users;country=ukraine/posts
This is normally sufficient enough but it has limitations like the maximum length. In your case a problem is that you can't easily describe and and or conjunctions with query parameters. But you can use a custom or standard query syntax. For instance if you want to find all cars or vehicles from Ford except the Capri with a price between $10000 and $20000 Google uses the search parameter
q=cars+OR+vehicles+%22ford%22+-capri+%2410000..%2420000
(the %22 is a escaped ", the %24 a escaped $).
If this does not work for your case and you want to pass data outside of the URI the format is just a matter of your taste. Adding a custom header like X-Filter may be a valid approach. I would tend to use a POST. Although you just want to query data this is still RESTful if you treat your request as the creation of a search result resource:
POST /search HTTP/1.1
your query-data
Your server should return the newly created resource in the Location header:
HTTP/1.1 201 Created
Location: /search/3
The result can still be cached and you can bookmark it or send the link. The downside is that you need an additional POST.

mongodb, node.js and encrypted data

I'm working on a project which involves a lot of encrypted data. Basically, these are JSON objects serialized into a String, then encrypted with AES256 into a Cyphertext, and then have to be stored in Mongo.
I could of course do this the way described above, which will store the cyphertext as String into a BSON document. However, this way, if for some reason along the way the Cyphertext isn't treated properly (for instance, different charset or whatever reason), the cyphertext is altered and I cannot rebuild the original string anymore. With millions of records, that's unacceptable (it's also slow).
Is there a proper way to save the cyphertext in some kind of native binary format, retrieve it binary and then return it to the original string? I'm used to working with strings, my skills with binary format are pretty rusty. I'm very interested in hearing your thoughts on the subject.
Thanks everyone for your input,
Fabian
yes :)
var Binary = require('mongodb').Binary;
var doc = {
data: new Binary(new Buffer(256))
}
or with 1.1.5 of the driver you can do
var doc = {
data: new Buffer(256)
}
The data is always returned as a Binary object however and not a buffer. The link to the docs is below.
http://mongodb.github.com/node-mongodb-native/api-bson-generated/binary.html

How to generate CouchDB UUID with Node.js?

Is there a way to generate random UUID like the ones used in CouchDB but with Node.js?
There are different ways to generate UUIDs. If you are already using CouchDB, you can just ask CouchDB for some like this:
http://127.0.0.1:5984/_uuids?count=10
CouchDB has three different UUID generation algorithms. You can specify which one CouchDB uses in the CouchDB configuration as uuids/algorithm. There could be benefits to asking CouchDB for UUIDs. Specifically, if you are using the "sequence" generation algorithm. The UUIDs you get from CouchDB will fall into that sequence.
If you want to do it in node.js without relying on CouchDB, then you'll need a UUID function written JavaScript. node-uuid is a JavaScript implementation that uses "Version 4" (random numbers) or "Version 1" (timestamp-based). It works with node.js or hosted in a browser: https://github.com/broofa/node-uuid
If you're on Linux, there is also a JavaScript wrapper for libuuid. It is called uuidjs. There is a performance comparison to node-uuid in the ReadMe of node-uuid.
If you want to do something, and it doesn't look like it's supported in node.js, be sure to check the modules available for npm.
I had the same question and found that simply passing a 'null' for the couchdb id in the insert statement also did the trick:
var newdoc = {
"foo":"bar",
"type": "my_couch_doctype"
};
mycouchdb.insert(newdoc, null /* <- let couchdb generate for you. */, function(err, body){
});

Resources