Make Node/MEANjs Highly Available - node.js

I'm probably opening up a can of worms with regard to how many hundreds of directions can be taken with this- but I want high availability / disaster recovery with my MEANjs servers.
Right now, I have 3 servers:
MongoDB
App (Grunt'ing the main application, this is the front end
server)
A third server for other processing on the back-end
So at the moment, if I reboot my MongoDB server (or more realistically, it crashes for some reason), I suddenly see this in my App server terminal:
MongoDB connection error: Error: failed to connect to
[172.30.3.30:27017] [nodemon] app crashed - waiting for file changes
before starting...
After MongoDB is back online, nothing happens on the app server until I re-grunt.
What's the best practice for this situation? You can see in the error I'm using nodeMon to monitor changes to the app. I bet upon init I could get my MongoDB server to update a file on the app server within nodemon's view to force a restart? Or is there some other tool I can use for this? Or should I be handling my connections to the db server more gracefully so the app doesn't "crash"?
Is there a way to re-direct to a secondary mongodb in case the primary isn't available? This would be more apt to HA/DR type stuff.

I would like to start with a side note: Given the description in the question and the comments to it, I am not convinced that using AWS is a wise option. A PaaS provider like Heroku, OpenShift or AppFog seems to be more suitable, especially when combined with a MongoDB service provider. Running MongoDB on EBS can be quite a challenge when you are new to MongoDB. And pretty expensive, too, as soon as you need provisioned IOPS.
Note In the following paragraphs, I simplified a few things for the sake of comprehensibility
If you insist on running it on your own, however, you have an option. MongoDB itself comes with means of automatic, transparent failover, called a replica set.
A minimal replica set consists of of two data bearing nodes and a so called arbiter. Write operations go to the node currently elected "primary" only, and reads do, too, unless you explicitly allow or request reads to be performed on the current "secondary". The secondary constantly syncs to the primary. If the current primary goes down for some reason, the former secondary becomes elected primary.
The arbiter is there so that there is always a quorum (qualified majority would be an equivalent term) of members to elect the current secondary to be the new primary. This quorum is mainly important for edge cases, but since you can not rule out these edge cases, an uneven number of members is a hard requirement for a MongoDB replica set (setting aside some special cases).
The beauty of this is that almost all drivers, and the node.js for sure, are replica set aware and deal with the failover procedure pretty gracefully. They simply send the reads and writes to the new primary, without any change to be done at any other point.
You only need to deal with some cases during the failover process. Without going into much detail, you basically check for certain errors in the according callbacks and redo the operation, if you encounter one of those errors and redoing the operation is feasible.
As you might have noticed, the third member, the arbiter, does not hold much data. It is a very lightweight process and can basically run on the cheapest instance you can find.
So you have data replication and automatic, transparent failover with relative ease at the cost of the cheapest VM you can find, since you would need two data bearing nodes anyway if you used any other means.

Related

Which MongoDB scaling strategy (Sharding, Replication) is suitable for concurrent connections?

Consider scenario that
I have multiple devclouds (remote workplace for developers), they are all virtual machines running on the same bare-metal server.
In the past, they used their own MongoDB containers running on Docker. So that number of MongoDB containers can add up to over 50 instances across devclouds.
The problem becomes apparent that while 50 instances is running at the same time, but only 5 people actually perform read/write operations against their own instances. So other 45 running instances waste the server's resources.
Should I use only one MongoDB cluster by combining a set of MongoDB instances ,for everyone so that they can connect to 1 endpoint only (via internal network) to avoid wasting resources.
I am considering the sharding strategy, but the problem is there are chances that if one node taken down (one VM shut down), is that ok for availability (redundancy)?
I am pretty new to sharding and replication, looking forward to know your solutions. Thank you
If each developer expects to have full control over their database deployment, you can't combine the deployments. Otherwise one developer can delete all data in the deployment, etc.
If each developer expects to have access to one database, you can deploy a single replica set serving all developers and assign one database per developer (via authentication).
Sharding in MongoDB sense (a sharded cluster) is not really going to help in this scenario since an application generally uses all of the shards. You can of course "shard manually" by setting up multiple replica sets.

Azure Redis Cache data loss?

I have a Node.js application that receives data via a Websocket connection and pushes each message to an Azure Redis cache. It stores a persistent array of messages in a variable for downstream use, and at regular intervals syncs that array from the cache. Bit convoluted, but at a later point I want to separate out the half of the application that writes to the cache from the half of it that reads from it..
At around 02:00 GMT, based on the Azure portal stats, I appear to have started getting "cache misses" on that sync, which last for a couple of hours before I started getting "cache hits" again sometime around 05:00.
The cache misses correspond to a sudden increase in CPU usage, which peaks at around 05:00. And when I say peaks, I mean it hits 81%, vs a previous max of about 6%.
So sometime around 05:00, the CPU peaks, then drops back to normal, the "cache misses" go away, but looking at the cache memory usage, I drop from about 37.4mb used to about 3.85mb used (which I suspect is the "empty" state), and the list that's being used by this application was emptied.
The only functions that the application is running against the cache are LPUSH and LRANGE, there's nothing that has any capability to remove data, and in case anybody was wondering, when the CPU ramped up the memory usage did not so there's nothing to suggest that rogue additions of data cropped up.
It's only on the Basic plan, so I'm not expecting it to be invulnerable or anything, but even without the replication features of the Standard plan I had expected that it wouldn't be in a position to completely wipe itself - I was under the impression that Redis periodically writes itself to disk and restores from that when it recovers from an error.
All of which is my way of asking:
Does anybody have any idea what might have happened here?
If this is something that others have been able to accidentally trigger themselves, are there any gotchas I should be looking out for that I might have in other applications using the same cache that could have caused it to fail so catastrophically?
I would welcome a chorus of people telling me that the Standard plan won't suffer from this sort of issue, because I've already forked out for it and it would be nice to feel like that was the right call.
Many thanks in advance..
Here my thoughts:
Azure Redis Cache stores information in memory. By default, it won't save a "backup" on disk, so, you had information in memory, for some reason the server got restarted and you lost your data.
PS: See this feedback, there is no option to persist information on disk using azure-redis cache yet http://feedback.azure.com/forums/169382-cache/suggestions/6022838-redis-cache-should-also-support-persistence
Make sure you don't use Basic plan. Basic plan doesn't suppose SLA and from my expirience it lost data quite often
Standard plan provides SLA and utilize 2 instances of Redis Cache. It's quite stable and it didn't lose our data, although such case still possible.
Now, if you're going to use Azure Redis as database, but not as a cache you need to utilize data persistance feature, which is already available in Azure Redis Cache Premium Tier: https://azure.microsoft.com/en-us/documentation/articles/cache-premium-tier-intro (see Redis data persistence)
James, using the Standards instance should give you much improved availability.
With the Basic tier any Azure Fabric update to the Master Node (or hardware failure), will cause you to loose all data.
Azure Redis Cache does not support persistence (writing to disk/blob) yet, even in Standard Tier. But the Standard tier does give you a replicated slave node, that can take over if you Master goes down.

Regarding process related issues in node js and mongo db

I am a novice programmer in Node JS. I have a few queries regarding process related issues like locking and race conditions in Node JS and Mongo DB.
My codes are working perfectly in local environment,but when I am moving to production and come across large number of requests,I might encounter certain issues.
How do we avoid write level race conditions for mongo slaves located in different regions? ie say one piece of data is being written locally but the true value for it is being written remotely that is delayed
Consider we have node processes located regionally would it need to hit mongo master located in another region which then routes the request to a regional slave? This considerably increases the latency of each write - how do we avoid this? Can we have direct writes to regional slaves from local processes and some kind of replication to maintain data consistency?
I use a Node REST api and use mongoose as the Mongo DB driver.Any help would be deeply appreciated .Thank you .
MongoDB's automatic failover and high availability features are provided by what's called replication. The standard MongoDB terms are "primary" for master and "secondary" for slave, so I'll use those terms to be consistent with the documentation and the user base at large. I think both of your questions are answered by one fact: in a replica set, the primary is the only member that accepts writes from clients, ever. The secondaries get the data replicated to them asynchronously a short time later. To answer the questions directly:
No writes to slaves except internal replication of writes from the primary, so no "race condition" with writes can arise.
All writes must go to the primary. The replication system will distribute to data to the secondaries asynchronously. You can read from secondaries, but it isn't a best practice despite its occasional utility. I'd suggest reading about replica set read preference and reading Asya Kamsky's blog post about scaling with replica sets before deciding to read from secondaries.

Server architecture for a scalable web application

we're planing to deploy a web-application with Amazon OpsWork and I just wanted to check with you, if our architecture might have any design flaws.
We've 4 components:
A load balanacer (Amazon preferably)
Express based on Node.js
MongoDB
ElasticSearch
Here's a communication diagram of our components:
At the front is a load balancer which distributes http requests to multiple web servers.
The web server is stateless and therefore can be cloned each time the load requires it. All web server instances are equal. Session information is saved in the MongoDB.
In the "backend" we're planing to use the build-in cluster functionalities from MongoDB and ElasticSearch. Therefore each web server instance only connects to a single MongoDB and ElasticSearch master instance. MongoDB and ElasticSearch are then scaling accordingly. Furthermore the the ElasticSearch master speaks to the MongoDB master to retrieve data for building the index.
How we see it, the most challenging task to setup such a system, is to configure OpsWorks with the MongoDB and ElasticSearch cluster.
Many thanks in advance!
if our architecture might have any design flaws.
Well, keep in mind that we can't tell much from a generic diagram. But here are some notes:
1) MongoDB isn't as easy to scale as other databases such as DynamoDB, Riak or Cassandra. For example, if you ever exceed the capacity of a single master (no matter how many slaves you have, all writes go to the single master), you'll have to shard. But switching to sharding is very disruptive and very tedious to set up.
If you don't expect to exceed the write capacity of one node, then you'll be fine on MongoDB.
2) What will you do for async tasks such as sending emails, creating long reports, etc?
It's possible to do these things in the request loop, and that's probably a fine way to get started. But as you have more boxes, the chances of failure go up. When a box dies, all the async tasks go away and nobody will know what they were. You also can have problems where one box gets heavily loaded with async tasks (using too much CPU or memory), and the problem will get worse and worse as it gets more tasks and completes them more slowly.
Also, a front-end like ELB will have a 60-second limit, which can cause problems if some of your requests could take longer. (Spin them off into async jobs with polling or something.)
3) ELB doesn't support web sockets. Consider that if you think you might want websockets down the road.
There's no such thing as a master in elastic search. You have master copies of shards and replicas of shards but they are basically moved around through your cluster by elastic search. Nodes might be master for one shard and a replica for another. So, you could simply put a load balancer in front of it.
However, you can specialize nodes to be data nodes or routing nodes as explained here: http://www.elasticsearch.org/guide/reference/modules/node/
The routing nodes effectively become load balancers. You could have a few of those (redundancy) and distribute load between those. Alternatively, you could run a dedicated router node on each web server. Basically routing nodes are pretty light and you save a bit of bandwidth/latency since your web server talks to localhost and from there it is all elastic search internal cluster traffic.
I'd recommend to replace MongoDB with Amazon Dynamo DB (it has node.js SDK).

Keeping replicas of Redis instances in sync ?

It is possible to create repicas of a Redis instance? If yes, what is the overhead of keeping them in sync (apart from the network traffic) ?
See the Redis documentation on setting up replication scenarios.
Since there is a delay between synchronisation, you'll probably need additional application-side logic to keep access that uses the same data to the same server instance. In some cases you may also need to issue additional 'slaveof' commands to one instance in case another goes down/comes back up.
If you need more concrete info, you should elaborate a bit on your use case, i.e. in what environment you're using Redis (i.e. Rails app cluster, custom client...).

Resources