Node.js and redis on same server? - node.js

We're using cloud hosting (Linode) to host a node.js based (and socket.io) chat app, with redis as the main DB. We haven't launched yet, but we're looking at hosting redis and node.js on the same machine (8 gb instance, redis limited to 5 GB for instance). All communication will be held in redis (ie straight from client to redis, no variables for dialog in node.js). To avoid network travel times amonsgt other bottle necks, we are looking at hosting redis and node.js on the same server. I can't find anything in documentation that would state this is a bad idea, but our sysops guy isn't convinced. Are there any drawbacks to going down this route?

Refer to a very similar answer I posted on a similar matter on SO: Redis deployment configuration - master slave replication where the OP was having a similar issue but his concern was more related to performance.
My main issue with your solution (a simply side note on the other answer) is the simple fact that your node.js application by design has to be cloud facing, e.g., the internet whereas your Redis or other Databases shouldn't.
It doesn't mean you'll have a security issue by all means, but in my opinion it's a best practice to only expose the hosts you really need to, e.g., the ones usually serving content directly to the user.
By not deploying Redis to an internet facing host you enforce a lot of security constrains simply by the topology design of your network.
Is it ok to host those services in the same box:
Yes, do run the benchmarks from time to time to check if you need to scale horizontally or just bump up the flooded hosts.
Check: Redis deployment configuration - master slave replication
Will I have security problems by having Redis or another service facing the internet?
If you know what you are doing - no, you won't have security issues. I still wouldn't do it though.

There is very little speed increase in cutting out the "network bottleneck"; redis is so fast that the latency is negligible. The issue with that approach is that if your box crashes then all of your data held in redis is gone unless you are replicating to a slave. If it goes down any other process not on that machince won't beable to access until it restarts. This may be fine for a staging env but I'd guard against it in production.

Related

Using Redis as a distributed internal nodeJS cache

Currently I have two load-balanced single-process nodeJS instances on Amazon beanstalk. They each just use a simple in-memory object as a cache. Because they are on different servers, their cache is not shared.
This is where Redis comes in. I want to utilize Redis to create a shared cache, but only to prime the internal memory of NodeJs.
You see, currently I am caching 4KB-10KB objects, if I solely relied on Redis then I have not only the Redis latency to retrieve the obejct but the network latency as well. I rather use Redis as a persistent cache that will prime my nodeJS instances when they are booted up, and also keep both internal caches in sync periodically (every x minutes)
A pretty basic nodeJs memory cache is https://github.com/tcs-de/nodecache
To complicate things even more, I am looking to start using nodeJS cluster ability to fork multiple processes of the application under the same server. Therefore, it is important that all clusters share 1 in-memory local server cache.
The aforementioned nodejs lib has a wrapper that aids the use in a cluster environment : https://github.com/lvx3/cluster-cache
To recap,
I will have Server A and server B who will be load balanced evenly. Each server (A&B) will have say 4 nodeJs processes who need to share just 1 cache (That is, the 4 server A nodeJs processes should all use a Server A cache, same for B)
Then I want the Server A and Server B cache to periodically sync and "persist" onto Redis. In the event of a crash or redeployment the Server cache would be primed with what is in Redis.
What are my options? Is there any well established solutions or mix of solutions that would be suitable? Such as a plugin like nodecache (simple) that has a Redis plugin? I also use express so perhaps there are express middleware that would be well suited for this.
Is it even worth the complexity to use Redis to prime a local server memory cache or should I just rely solely on Redis and take the network latency hit?
An acceptable but somewhat disappointing time for me to get back a 10KB object is 20 ms. I'd much prefer around 1ms. Redis and the nodeJs servers will be on Amazon so will be pretty close together.
I understand that if I have a redis cache of say 50MB that the same 50MB would exist on Server A and Server B. I am more than willing to spend money on hardware/ram for the benefit of speed.

I'm not sure how to correctly configure my server setup

This is kind of a multi-tiered question in which my end goal is to establish the best way to setup my server which will be hosting a website as well as a service (using Socket.io) for an iOS (and eventually an Android) app. Both the app service and the website are going to be written in node.js as I need high concurrency and scaling for the app server and I figured whilst I'm at it may as well do the website in node because it wouldn't be that much different in terms of performance than something different like Apache (from my understanding).
Also the website has a lower priority than the app service, the app service should receive significantly higher traffic than the website (but in the long run this may change). Money isn't my greatest priority here, but it is a limiting factor, I feel that having a service that has 99.9% uptime (as 100% uptime appears to be virtually impossible in the long run) is more important than saving money at the compromise of having more down time.
Firstly I understand that having one node process per cpu core is the best way to fully utilise a multi-core cpu. I now understand after researching that running more than one per core is inefficient due to the fact that the cpu has to do context switching between the multiple processes. How come then whenever I see code posted on how to use the in-built cluster module in node.js, the master worker creates a number of workers equal to the number of cores because that would mean you would have 9 processes on an 8 core machine (1 master process and 8 worker processes)? Is this because the master process usually is there just to restart worker processes if they crash or end and therefore does so little it doesnt matter that it shares a cpu core with another node process?
If this is the case then, I am planning to have the workers handle providing the app service and have the master worker handle the workers but also host a webpage which would provide statistical information on the server's state and all other relevant information (like number of clients connected, worker restart count, error logs etc). Is this a bad idea? Would it be better to have this webpage running on a separate worker and just leave the master worker to handle the workers?
So overall I wanted to have the following elements; a service to handle the request from the app (my main point of traffic), a website (fairly simple, a couple of pages and a registration form), an SQL database to store user information, a webpage (probably locally hosted on the server machine) which only I can access that hosts information about the server (users connected, worker restarts, server logs, other useful information etc) and apparently nginx would be a good idea where I'm handling multiple node processes accepting connection from the app. After doing research I've also found that it would probably be best to host on a VPS initially. I was thinking at first when the amount of traffic the app service would be receiving will most likely be fairly low, I could run all of those elements on one VPS. Or would it be best to have them running on seperate VPS's except for the website and the server status webpage which I could run on the same one? I guess this way if there is a hardware failure and something goes down, not everything does and I could run 2 instances of the app service on 2 different VPS's so if one goes down the other one is still functioning. Would this just be overkill? I doubt for a while I would need multiple app service instances to support the traffic load but it would help reduce the apparent down time for users.
Maybe this all depends on what I value more and have the time to do? A more complex server setup that costs more and maybe a little unnecessary but guarantees a consistent and reliable service, or a cheaper and simpler setup that may succumb to downtime due to coding errors and server hardware issues.
Also it's worth noting I've never had any real experience with production level servers so in some ways I've jumped in the deep end a little with this. I feel like I've come a long way in the past half a year and feel like I'm getting a fairly good grasp on what I need to do, I could just do with some advice from someone with experience that has an idea with what roadblocks I may come across along the way and whether I'm causing myself unnecessary problems with this kind of setup.
Any advice is greatly appreciated, thanks for taking the time to read my question.

Server architecture for a scalable web application

we're planing to deploy a web-application with Amazon OpsWork and I just wanted to check with you, if our architecture might have any design flaws.
We've 4 components:
A load balanacer (Amazon preferably)
Express based on Node.js
MongoDB
ElasticSearch
Here's a communication diagram of our components:
At the front is a load balancer which distributes http requests to multiple web servers.
The web server is stateless and therefore can be cloned each time the load requires it. All web server instances are equal. Session information is saved in the MongoDB.
In the "backend" we're planing to use the build-in cluster functionalities from MongoDB and ElasticSearch. Therefore each web server instance only connects to a single MongoDB and ElasticSearch master instance. MongoDB and ElasticSearch are then scaling accordingly. Furthermore the the ElasticSearch master speaks to the MongoDB master to retrieve data for building the index.
How we see it, the most challenging task to setup such a system, is to configure OpsWorks with the MongoDB and ElasticSearch cluster.
Many thanks in advance!
if our architecture might have any design flaws.
Well, keep in mind that we can't tell much from a generic diagram. But here are some notes:
1) MongoDB isn't as easy to scale as other databases such as DynamoDB, Riak or Cassandra. For example, if you ever exceed the capacity of a single master (no matter how many slaves you have, all writes go to the single master), you'll have to shard. But switching to sharding is very disruptive and very tedious to set up.
If you don't expect to exceed the write capacity of one node, then you'll be fine on MongoDB.
2) What will you do for async tasks such as sending emails, creating long reports, etc?
It's possible to do these things in the request loop, and that's probably a fine way to get started. But as you have more boxes, the chances of failure go up. When a box dies, all the async tasks go away and nobody will know what they were. You also can have problems where one box gets heavily loaded with async tasks (using too much CPU or memory), and the problem will get worse and worse as it gets more tasks and completes them more slowly.
Also, a front-end like ELB will have a 60-second limit, which can cause problems if some of your requests could take longer. (Spin them off into async jobs with polling or something.)
3) ELB doesn't support web sockets. Consider that if you think you might want websockets down the road.
There's no such thing as a master in elastic search. You have master copies of shards and replicas of shards but they are basically moved around through your cluster by elastic search. Nodes might be master for one shard and a replica for another. So, you could simply put a load balancer in front of it.
However, you can specialize nodes to be data nodes or routing nodes as explained here: http://www.elasticsearch.org/guide/reference/modules/node/
The routing nodes effectively become load balancers. You could have a few of those (redundancy) and distribute load between those. Alternatively, you could run a dedicated router node on each web server. Basically routing nodes are pretty light and you save a bit of bandwidth/latency since your web server talks to localhost and from there it is all elastic search internal cluster traffic.
I'd recommend to replace MongoDB with Amazon Dynamo DB (it has node.js SDK).

How to Scale Node.js WebSocket Redis Server?

I'm writing a chat server for Acani, and I have some questions about Scaling node.js and websockets with load balancer scalability.
What exactly does it mean to load balance Node.js? Does that mean there will be n independent versions of my server application running, each on a separate server?
To allow one client to broadcast a message to all the others, I store a set of all the webSocketConnections opened on the server. But, if I have n independent versions of my server application running, each on a separate server, then will I have n different sets of webSocketConnections?
If the answers to 1 & 2 are affirmative, then how do I store a universal set of webSocketConnections (across all servers)? One way I think I could do this is use Redis Pub/Sub and just have every webSocketConnection subscribe to a channel on Redis.
But, then, won't the single Redis server become the bottleneck? How would I then scale Redis? What does it even mean to scale Redis? Does that mean I have m independent versions of Redis running on different servers? Is that even possible?
I heard Redis doesn't scale. Why would someone say that. What does that mean? If that's true, is there a better solution to for pub/sub and/or storing a list of all broadcasted messages?
Note: If your answer is that Acani would never have to scale, even if each of all seven billion people (and growing) on Earth were to broadcast a message every second to everyone else on earth, then please give a valid explanation.
Well, few answers for your question:
To load balance Node.js, it means exactly what you thought about what it is, except that you don't really need separate server, you can run more then one process of your node server on the same machine.
Each server/process of your node server will have it's own connections, the default store for websockets (for example Socket.IO) is MemoryStore, it means that all the connections will be stored on the machine memory, it is required to work with RedisStore in order to work with redis as a connection store.
Redis PUB/SUB is a good way to achieve this task
You are right about what you said here, redis doesn't scale at this moment and running a lot of processes/connections connected to redis can make redis to be a bottleneck.
Redis doesn't scale, that is correct, but according to this presentation you can see that a cluster development is in top priority at redis and redis do have a cluster, it's just not stable yet: (taken from http://redis.io/download)
Where's Redis Cluster?
Redis development is currently focused on Redis 2.6 that will bring you support for Lua scripting and many other improvements. This is our current priority, however the unstable branch already contains most of the fundamental parts of Redis Cluster. After the 2.6 release we'll focus our energies on turning the current Redis Cluster alpha in a beta product that users can start to seriously test.
It is hard to make forecasts since we'll release Redis Cluster as stable only when we feel it is rock solid and useful for our customers, but we hope to have a reasonable beta for summer 2012, and to ship the first stable release before the end of 2012.
See the presentation here: http://redis.io/presentation/Redis_Cluster.pdf
2) Using Redis might not work to store connections: Redis can store data in string format, and if the connecion object has circular references (ie, Engine.IO) you won't be able serialise them
3) Creating a new Redis client for each client might not be a good approach so avoid that trap if you can
Consider using ZMQ node library to have processes communicate with each other through TCP (or IPC if they are clustered as in master-worker)

Scaling Node.JS across multiple cores / servers

Ok so I have an idea I want to peruse but before I do I need to understand a few things fully.
Firstly the way I think im going to go ahead with this system is to have 3 Server which are described below:
The First Server will be my web Front End, this is the server that will be listening for connection and responding to clients, this server will have 8 cores and 16GB Ram.
The Second Server will be the Database Server, pretty self explanatory really, connect to the host and set / get data.
The Third Server will be my storage server, this will be where downloadable files are stored.
My first questions is:
On my front end server, I have 8 cores, what's the best way to scale node so that the load is distributed across the cores?
My second question is:
Is there a system out there I can drop into my application framework that will allow me to talk to the other cores and pass messages around to save I/O.
and final question:
Is there any system I can use to help move the content from my storage server to the request on the front-end server with as little overhead as possible, speed is a concern here as we would have 500+ clients downloading and uploading concurrently at peak times.
I have finally convinced my employer that node.js is extremely fast and its the latest in programming technology, and we should invest in a platform for our Intranet system, but he has requested detailed documentation on how this could be scaled across the current hardware we have available.
On my front end server, I have 8
cores, what's the best way to scale
node so that the load is distributed
across the cores?
Try to look at node.js cluster module which is a multi-core server manager.
Firstly, I wouldn't describe the setup you propose as 'scaling', it's more like 'spreading'. You only have one app server serving the requests. If you add more app servers in the future, then you will have a scaling problem then.
I understand that node.js is single-threaded, which implies that it can only use a single core. Not my area of expertise on how to/if you can scale it, will leave that part to someone else.
I would suggest NFS mounting a directory on the storage server to the app server. NFS has relatively low overhead. Then you can access the files as if they were local.
Concerning your first question: use cluster (we already use it in a production system, works like a charm).
When it comes to worker messaging, i cannot really help you out. But your best bet is cluster too. Maybe there will be some functionality that provides "inter-core" messaging accross all cluster workers in the future (don't know the roadmap of cluster, but it seems like an idea).
For your third requirement, i'd use a low-overhead protocol like NFS or (if you can go really crazy when it comes to infrastructure) a high-speed SAN backend.
Another advice: use MongoDB as your database backend. You can start with low-end hardware and scale up your database instance with ease using MongoDB's sharding/replication set features (if that is some kind of requirement).

Resources