Database Exposure: Best Practices - node.js

I'm a relatively new web programmer and I'm currently working on my first major project. I'm using angular, express (on top of node), and the graph database neo4j. Right now I'm trying to determine the best (in terms of security and speed optimization) way to set up how the web app interacts with the database.
Right now I feel like I'm going into this somewhat blindly- What I'm looking for is a guide of best practices, security issues to take into account, and any other relevant advice or pitfalls to be aware of in setting up a web app backend.
To put this into a bit more concrete terms I'll give you an idea of how I'm setting up routes right now. The following are the routes setup in the app.js file.
//match database query functions
function dataQuery(req, res) {
var func = database[req.param('query')];
func(req, res);
}
//match database create functions
function dataCreate(req, res) {
var func = database[req.param('create')];
func(req, res);
}
//handle data queries
app.get('/query/:query', dataQuery);
//handle adding new content
app.post('/create/:create', dataCreate)
Essentially I have it set up so that I POST or GET to a url that just goes and executes a function. I'm essentially naming the function I want to run in the url: /query/theNameOfTheFunction. These functions then go and either build a cypher query (neo4j's query language) utilizing information in the request to interact with the database or handles things like adding user uploaded images.
Example: Creating Content (URL: /query/createContent)
exports.createContent = function (req, res) {
var content = JSON.parse(req.query.content);
var query = ("CREATE (n:Content {Title: {title}, URL: {url}, Description: {description}, Source: {source}, Links: {links}, Value: {valueStatement} })");
query = query.replace("{title}", "\"" + content.title + "\"");
query = query.replace("{url}", "\"" + content.url + "\"");
query = query.replace("{description}", "\"" + content.description + "\"");
query = query.replace("{source}", "\"" + content.source + "\"");
query = query.replace("{links}", "\"" + content.links + "\"");
query = query.replace("{valueStatement}", "\"" + content.valueStatement + "\"");
db.query(query, function (err, results) {
if (err) {res.send()};
res.send();
});
}
Here I've got a template for the query and just drop in user generated information using replace.
Example: Adding images to server (URL: /create/addImage)
exports.addImage = function (req,res) {
var url = req.query.url;
var fileName = req.query.fileName;
console.log(req.query);
request(url).pipe(fs.createWriteStream("./img/submittedContent/" + fileName));
res.send();
}
It seems that this approach is probably not very scalable but I'm not sure how to best organize the code on the server side.
One other specific example I would like to mention is the following case. The query itself is complicated and I've pushed creating it to the client side for now (the query looks for content related the terms that the user has selected and varies in length accordingly). The client sends the query that is created it is passed into the neo4j api. Obviously there are concerns here- if the user is able to define the query they could perform any action on the database (deleting everything or whatever). I'm not clear on how someone could go about doing this exactly, but it certainly seems feasible.
exports.getContent = function (req, res) {
var query = req.query.query;
//would checking for black/white list key terms be enough security? (remove, create, set, etc)
db.query(query, function (err, results) {
if (err) {throw err};
res.send(results);
});
}
Am I going about this stuff completely wrong headed? I've never gotten a formal introduction to server side scripting and am only going off of things I've read. I would like to do it the 'right way' but I need to know what that way is first...

Just some random pointers:
I would suggest setting up a RESTful web API to handle the communication between Angular and your database; it takes the hassle out of having to invent all the routes yourself and it also means you can use great libraries like Restangular (for the client) and Restify (for the server) to handle the communications;
not sure which Neo4j driver you're using, but I'm pretty sure they all support parameterized queries, meaning that you don't need to do all those query.replace() calls (see);
depending on the number of images that might get uploaded, storing them in the filesystem might be okay, although you should never trust the passed filename; if you want a bit more scalability, you could consider using MongoDB's GridFS;
never trust queries being passed from the client to be performed on the server; if you can build the query on the client side, you can also build it on the server side with information passed from the client to the server (again, use parameterized queries);

Related

Intercepting knex.js queries pre-execution

I'm working on caching strategies for an application that uses knex.js for all sql related stuff.
Is there a way to intercept the query to check if it can be fetched from a cache instead of querying the database?
Briefly looked into knex.js events, which has a query event.
Doc:
A query event is fired just before a query takes place, providing data about the query, including the connection's __knexUid / __knexTxId properties and any other information about the query as described in toSQL. Useful for logging all queries throughout your application.
Which means that it's possible to do something like (also from docs)
.from('users')
.on('query', function(data) {
app.log(data);
})
.then(function() {
// ...
});
But is it possible to make the on query method intercept and do some logic before actually executing the query towards the database?
I note that this suggestion is attached to a Knex GitHub issue (credit to Arian Santrach) which seems relevant:
knex.QueryBuilder.extend('cache', async function () {
try {
const cacheKey = this.toString()
if(cache[cacheKey]) {
return cache[cacheKey]
}
const data = await this
cache[cacheKey] = data
return data
} catch (e) {
throw new Error(e)
}
});
This would allow:
knex('tablename').where(criteria).cache()
to check for cached data for the same query. I would think a similar sort of structure could be used for whatever your caching solution was, using the query's string representation as the key.

What is the reason for using GET instead of POST in this instance?

I'm walking through the Javascript demos of pg-promise-demo and I have a question about the route /api/users/:name.
Running this locally works, the user is entered into the database, but is there a reason this wouldn't be a POST? Is there some sort of advantage to creating a user in the database using GET?
// index.js
// --------
app.get('/api/users/:name', async (req, res) => {
try {
const data = (req) => {
return db.task('add-user', async (t) => {
const user = await t.users.findByName(req.params.name);
return user || t.users.add(req.params.name);
});
};
} catch (err) {
// do something with error
}
});
For brevity I'll omit the code for t.users.findByName(name) and t.users.add(name) but they use QueryFile to execute a SQL command.
EDIT: Update link to pg-promise-demo.
The reason is explained right at the top of that file:
IMPORTANT:
Do not re-use the HTTP-service part of the code from here!
It is an over-simplified HTTP service with just GET handlers, because:
This demo is to be tested by typing URL-s manually in the browser;
The focus here is on a proper database layer only, not an HTTP service.
I think it is pretty clear that you are not supposed to follow the HTTP implementation of the demo, rather its database layer only. The demo's purpose is to teach you how to organize a database layer in a large application, and not how to develop HTTP services.

Nodejs - tips for creating multiple endpoints

I have a nodejs/express server being used by both Web application and Mobile application, but for now they use the same end points. But I want to divide my api into 2 one of which is for mobile and obviously the other is for web. The requests are going to be "exactly" the same. What comes to my mind as a solution is duplicating all the request where paths for newly created ones are different(so that in the mobile app, these request can be used). But this solution does not seem right, as it may mean making big changes on the client side. Is there an elegant and also favourably easier solution? Any suggestion would be appreciated.
router.get('api/snow/manuel',
function (req, res, next) {
const snowProjection = {_id: 0};
snowThick.find({}, snowProjection)
.toArray(function (err, data) {
if (err) return next(new APIError.ServerError("An error occured" + " " + err));
return res.send(data);
})
});
Here is an sample get request in my server.

Azure mobile apps CRUD operations on SQL table (node.js backend)

This is my first post here so please don't get mad if my formatting is a bit off ;-)
I'm trying to develop a backend solution using Azure mobile apps and node.js for server side scripts. It is a steep curve as I am new to javaScript and node.js coming from the embedded world. What I have made is a custom API that can add users to a MSSQL table, which is working fine using the tables object. However, I also need to be able to delete users from the same table. My code for adding a user is:
var userTable = req.azureMobile.tables('MyfUserInfo');
item.id = uuid.v4();
userTable.insert(item).then( function (){
console.log("inserted data");
res.status(200).send(item);
});
It works. The Azure node.js documentation is really not in good shape and I keep searching for good example on how to do simple things. Pretty annoying and time consuming.
The SDK documentation on delete operations says it works the same way as read, but that is not true. Or I am dumb as a wet door. My code for deleting looks like this - it results in exception
query = queries.create('MyfUserInfo')
.where({ id: results[i].id });
userTable.delete(query).then( function(delet){
console.log("deleted id ", delet);
});
I have also tried this and no success either
userTable.where({ id: item.id }).read()
.then( function(results) {
if (results.length > 0)
{
for (var i = 0; i < results.length; i++)
{
userTable.delete(results[i].id);
});
}
}
Can somebody please point me in the right direction on the correct syntax for this and explain why it has to be so difficult doing basic stuff here ;-) It seems like there are many ways of doing the exact same thing, which really confuses me.
Thanks alot
Martin
You could issue SQL in your api
var api = {
get: (request, response, next) => {
var query = {
sql: 'UPDATE TodoItem SET complete=#completed',
parameters: [
{ name: 'completed', value: request.params.completed }
]
};
request.azureMobile.data.execute(query)
.then(function (results) {
response.json(results);
});
}};
module.exports = api;
That is from their sample on GitHub
Here is the full list of samples to take a look at
Why are you doing a custom API for a table? Just define the table within the tables directory and add any custom authorization / authentication.

waiting for data to appear/change in DB

I am writing REST api which has to provide kind of real-time communication between users. Lets say I have db.orders collection. And I have api GET /order/{id}. This api should wait for some change in order document. For example it should return some data only when order.status is ready. I know how to do long-polling but I no idea how to check for data to appear/change in db. It would be easy if there was one app instance - then I could do this in memory, something like this:
var queue = []
// GET /order/{id}
function(req,res,next) {
var data = getDataFromDb();
if(data && data.status == 'ready') {
res.send(data);
return;
}
queue.push({id: req.params.id, req: req, res: res, next: next});
}
// POST /order/{id}
function(req,res,next) {
req.params.data.status = 'ready'
saveToDb(req.params.data);
var item = findInQueue(queue,req.params.id);
if(item) item.res.send(req.params.data);
}
First handler waits for data to have status ready and second sets status of data to ready. Its just a pseudocode and many things are missing (timeout for example).
The problem is when I want to use many instances of such app - I need some messaging mechanism which will allow to communicate across instances in kind of real time.
I read about REDIS PUB/SUB but I am not sure if I can use it in this way...
I am using node.js + restify + mongoDB for now.
You are looking for the oplog. It's a special capped collection where all operations on the database are stored. To enable them on a single server you can do.
mongod --dbpath=./data --oplogSize=100 --replSet test
then connect to the server using the console and write
rs.initiate()
use the console and do
use local
show collections
Notice the collection oplog.rs. it contains all the operations that have been applied to the server. If you are using node.js you can listen to the changes in the following way
var local = db.db("local");
var steam = local.collection("oplog.rs").find({}, {tailable:true, awaitdata:true}).stream();
stream.on('data', function(doc) {
});
for each operation on mongodb you'll receive a doc where you can establish if something you are interested in changed state.

Resources