Scaling nodejs app with pm2 - node.js

I have an app that receives data from several sources in realtime using logins and passwords. After data is recieved it's stored in memory store and replaced after new data is available. Also I use sessions with mongo-db to auth user requests. Problem is I can't scale this app using pm2, since I can use only one connection to my datasource for one login/password pair.
Is there a way to use different login/password for each cluster or get cluster ID inside app?
Are memory values/sessions shared between clusters or is it separated? Thank you.

So if I understood this question, you have a node.js app, that connects to a 3rd party using HTTP or another protocol, and since you only have a single credential, you cannot connect to said 3rd party using more than one instance. To answer your question, yes it is possibly to set up your clusters to use a unique use/pw combination, the tricky part would be how to assign these credentials to each cluster (assuming you don't want to hard code it). You'd have to do this assignment when the servers start up, and perhaps use a a data store to hold these credentials and introduce some sort of locking mechanism for each credential (so that each credential is unique to a particular instance).
If I was in your shoes, however, what I would do is create a new server, whose sole job would be to get this "realtime data", and store it somewhere available to the cluster, such as redis or some persistent store. The server would then be a standalone server, just getting this data. You can also attach a RESTful API to it, so that if your other servers need to communicate with it, they can do so via HTTP, or a message queue (again, Redis would work fine there as well.

'Realtime' is vague; are you using WebSockets? If HTTP requests are being made often enough, also could be considered 'realtime'.
Possibly your problem is like something we encountered scaling SocketStream (websockets) apps, where the persistent connection requires same requests routed to the same process. (there are other network topologies / architectures which don't require this but that's another topic)
You'll need to use fork mode 1 process only and a solution to make sessions sticky e.g.:
https://www.npmjs.com/package/sticky-session
I have some example code but need to find it (over a year since deployed it)
Basically you wind up just using pm2 for 'always-on' feature; sticky-session module handles the node clusterisation stuff.
I may post example later.

Related

How to direct a user to an available websocket server when she logs in to my multi-server Node.js app?

This is more like a design question but I have no idea where to start.
Suppose I have a realtime Node.js app that runs on multiple servers. When a user logs in she doesn't know which server she will be assigned to. She will just login, do something and logout and that's it. A user won't be interacting with other users on a different server, nor will her details be stored on another server.
In the backend I assume the Node.js server will put the user's login details to some queue and then when there is space it will assign this user to an available server (A server that has the lowest ping value or is not full). Because there is a limit number of users on one physical server when the users try to login to a "full" server it will direct her to another available server.
I am using ws module of node.js. Is there any service available for this purpose or do I have to build my own? How difficult would that be?
I am not sure how websocket fits into this question. Ignoring it. I guess your actual question is about load balancing... Let me try paraphasing it.
Q: Does NodeJS has any load balancing feature that I can leverage?
Yes and it is called cluster in NodeJS. Instead of the traditional one node process listening on a single port, this module allows you to spawn a group of node processes and have them all binded to the same port.
This means is that all the user know is only the service's endpoint. He sends a request to it and 1 of the available server in the group will serve him whenever possible.
Alternatively using Nginx, the web server, as your load balancer is also a very popular approach to this problem.
References:
Cluster API: https://nodejs.org/api/cluster.html
Nginx as load balancer: http://nginx.org/en/docs/http/load_balancing.html
P.S
I guess the key word for googling solutions to your problem is load balancer.
Out of the 2 solutions I would recommend going the Nginx way as it is a much scalable approach
Example:
Your Node process could possibly be spread across multiple hosts (horizontal scaling). The former solution is more for vertical scaling, taking advantages of multi-cores machine.

If I only want to use redis pubsub to create some real time client functionality, is it ok security-wise to connect to redis direct from client?

I'm trying to create a Flash app with some real time functionality, and would like to use Redis' pubsub functionality which is a perfect fit for what I need.
I know that connecting to a data store directly from client is almost always bad. What are the security implications of this (since I'm not an expert on Redis), and are there ways to work around them? From what I read, there is a possible exploit of doing config sets and changing the rdb file location and be able to arbitrary overwrite files. Is there anything else? (If I don't use that particular redis instance for anything at all, i.e. no data being stored)
I understand the alternative is to write some custom socket server program and have it act as the mediating layer for connecting to redis and issuing commands -- that's the work I'd like to avoid having to write, if possible.
** Edit **
Just learned about the rename-command configuration to disable commands. If I disable every single command on the redis instance and leave only SUBSCRIBE and PUBLISH open, would this be good enough to run on production?
I think it would be a bad idea to connect directly your client to Redis. Redis offers an authentication system for a unique user only. It expects this user to be your server app.
From my point of view, directly exposing Redis is always a bad idea. It would allow anybody to access all of your data. This is confirmed by the Redis doc.
So you won't avoid adding or developing the server side of your app.

Nginx - Routing, Capabililties of with Node.js

I am running a Node.js server which is now getting more load and I need to start getting this running on multiple cores, as Node.js is single threaded and can only run on one.
This is a simple solution given the Node.js Cluster module and tons of NPM packages for this very thing.
I have a problem in that I need browser sessions to retain the same Node.js worker after the first request. This is because I store authentication data, etc. in a single node worker process and do not want to open the can of worms of messaging between worker processes, etc. etc.
My browsers store a session id cookie once authenticated, and I want a system to re-route requests to their correct worker based on their session cookie.
Nginx looks promising, but I know nothing about it, and while I will put the work into it, I would like to know before I spend hours diving into it, if it is capable of routing to Node.js worker processes based on arbitrary data from the request header, such as a session cookie.
Is this doable? If I know it is, I'll get down and dirty figuring out Nginx, ground up.
I assume you are storing your sessions in nodejs memory. You might want to store your sessions in redis instead. This way it is persisted outside of a single server, and can be accessed from multiple processes.
In addition to redis, you might also want to look into Amazon Elastic Beanstalk for managing your load-balancing. You can setup an nginx proxy to route your requests to multiple servers based on their load.
This link might be able to get you started http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_nodejs_express_elasticache.html

Going session-less with NodeJS

I've been doing a lot of research lately and it appears to me that going stateless serverside brings benefits to both performance & scalability.
I am although trying to figure out how to achieve session-less-ness on Node.JS. It seems to me that basically all I have to do is assign a token to a logged in user, so I would have something like this in my DB:
{ user:'foo#example.com', pass:'123456', token:'long_id_here' }
so that the token can be send with every HTTP request like this:
/set/:key/:val/:token
to be checked against aforementioned DB object. Is this what it is actually meant to be a session-less web service?
If this is the right way, then I do not understand things like token expiry, and other security issues. I would like to be pointed out to NPM package of some sort?
On a side note, is it best for a token, to use a hash of the user+password, or to assign a different one at every login?
The reason to go sessionless is that most default session implementations use an in-memory store. That means that the session information is stored in memory local to that instance. Most websites these days are scaling out as traffic increases. This means they add more servers and balance the load between the servers. The problem with in-memory session stores is your user can log into Server 1, but if their next request is routed to Server 2, they don't have a session created yet and will appear to be logged off.
You don't necessarily need to go sessionless to scale out with node or any other server side language. You just need to use a session that isn't in local memory that would be accessible to all nodes. If you're using something like Express or Connect, you can easily use a session implementation like connect-redis which will enable you to have a fast session store which is accessible to all of your node instances so it doesn't matter which one is hit.

Node.JS/Meteor on multiple servers serving the same application

When coming to deploy Node.JS/Meteor for large scale application a single CPU will not be sufficient. We also would like to have it on multiple servers for redundancy.
What is the recommended setup for such deployment ? how does the load balancing works ? will this support the push data technology for cross servers clients (one client connects to server 1, 2nd client connects to server 2 and we would like an update in client one to be seen in client 2 and vice versa).
Thanks Roni.
At the moment you just need to use a proxy between them. The paid galaxy solution should help but details are scarce at the moment as the product isn't out yet.
You can't simply proxy (normally using nginx, etc) between two servers as each server will store the user's state (i.e their login state) during the DDP Session (the raw wire protocol meteor uses to transmit data).
There is one way you could do it at the moment. Get meteorite and install a package called meteor-cluster.
The package should help you relay data between instances and relay data between the instances via Redis. A youtube video also shows this and how to set it up.
An alternative solution is to use Hipache to manage the load balancing. You can use multiple workers (backends) for one frontent, like so:
$ redis-cli rpush frontend:www.yourdomain.com http://address.of.server.2
$ redis-cli rpush frontend:www.yourdomain.com http://address.of.server.2
There is more information on how to do this in the git page I linked to above, there is a single config file to edit and the rest is done for you. You may also want to have a dedicated server for MongoDB.

Resources