Node.js(&MongoDB) server crashes , Database operations halfway? - node.js

I have a node.js app with mongodb backend going to production in a week and i have few doubts on how to handle app crashes and restart .
Say i have a simple route /followUser in which i have 2 database operations
/followUser
----->Update User1 Document.followers = User2
----->Update User2 Document.followers = User1
----->Some other mongodb(via mongoose)operation
What happens if there is a server crash(due to power failure or maybe the remote mongodb server is down ) like this scenario :
----->Update User1 Document.followers = User2
SERVER CRASHED , FOREVER RESTARTS NODE
What happens to these operations below ? The system is now in inconsistent state and i may have error everytime i ask for User2 followers
----->Update User2 Document.followers = User1
----->Some other mongodb(via mongoose)operation
Also please recommend good logging and restart/monitor modules for apps running in linux.
Right now im using domains for to catch exceptions , doing server.close , but before process.exit() i want to make sure all database transactions are done , can i check this by testing if the event loop is empty or not (how?) and then process.exit(1) ?

You need transactions for this, and since MongoDB doesn't have them here is a workaround http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/

One way to address this problem is to add cleanup code to your application that runs whenever the application starts. You write the cleanup code to perform sanity checks on any portions of your data that can be updated in multiple steps like your example and then repairs that data in whatever way make sense for your application.
Depending on the complexity of your application/data, this may also require that you keep a log of actions the app was trying to perform, but that gets complicated real fast. Ideally it's more a matter of refreshing denormalized data and deleting partial data.
You want to do this during startup rather than shutdown as there's no guarantee your shutdown code will fully run and if you're shutting down because of an exception you don't know what the state of your system is at that point.

the solution given by vkurchatkin in this link is a workaround in case your appserver crashes, because you will be able of knowing which transactions were pending at that moment. If you implement this in your code you can create cleanup code when your system restart as suggested by JohnnyHK. The code you mention (catching exceptions, test when closing, etc) will not work because...well.. your server crashed! ;-)
That said, this is done using the database, so you will have to warrantee to a certain point that your database does not crash. I would suggest your use replication for that. It is basically a cluster of servers that recovers itself if one node fails, and also you can make some check to make sure that the data reached the servers and is safe.
Hope this helps.

Related

Nodejs stuck on processing whenever the app is restarted

I have a nodejs application running on Linux, as we all know, whenever I restart the nodejs app it will get a new PID, suppose while the nodejs app is running, a client connects to it and running some process and the process status is processing, during that point of time, if the nodejs app restarts(on the server-side), how can we make sure the client connects back to the previous processing state.
What is happening now is, whenever the server restarts, the process stucks in processing forever.
Just direct me to a sample of how this scenario is handled in real life.
Thank You.
If I'm understanding you correctly, then the answer is you can't...
The reason for this is that, when you restart the process the event loop is restarted, meaning any processes that were running or were waiting in the event loop are gone. You are essentially clearing out the event loop when you restart.
I would say though, if you know the process is 'crashing' node then you probably want to look into that process and see why is crashing, place it in a try catch to it wont kill the server.
now with that said ( and without knowing what, processing state really means ) you could set a flag in your DB server for say 'job1' and have a status column of say 'running' when it was kicked off. When the node server restarts it can read Job status for 'running' jobs, if the 'job' is in a 'running' state you can fire off the job again and once complete update the table to 'completed'
This probably not the most efficient way as it's much better to figure out why the process if crashing, but as a fall-back this could work although in a clustered environment this could cause issues because server 1 may fail while server 2 is processing because server 1 does not know what server two is doing. With more details as to the use case, environment etc would probably allow for a better answer

Do I really need to call client.shutdown() when finished with Cassandra in Node.js script?

I've been trying to find information about Cassandra sessions relating to the Node.js cassandra-driver by Datastax. I read something which said that cassandra-driver automatically manages a session and that I don't need to call client.shutdown().
I'm looking for general information about how cassandra-driver manages sessions, how can I see all active Cassandra sessions, and do I need to call shutdown() or is that counter productive having to reopen a session every time the script is run?
Based on "pm2 info" I don't see a ton of active handles so I don't think anything wrong is going on but I may be mistaken. Ram usage does seem a bit high for a small script (85mb).
In the DataStax drivers, Session is a stateful object handling a pool of connections and aware of the status of nodes in the Cluster at any time (avoiding sending request to unavailable node). TCP sockets are opened and it is a best practice to close when you don't need it anymore. See here to get more infos : https://docs.datastax.com/en/developer/nodejs-driver-dse/2.1/features/connection-pooling/
Now session.connect() may takes a bit of time: the more nodes you have in your cluster, the longer it will be to open connections to every single one. This is the reason why, it is better to init connections in a "cold start" when you work with FAAS (avoiding to open/close for each request)
So:
Always close your connections (shutdown()) when you don't need it anymore (shutdown hook in your applications)
Keep your connections "alive" as long as you need it, do not shutdown for each request, this is NOT stateless.
yes, it is "better" to connect the client outside of the handler function. to keep it state-Full.
however, AWS Lambda with nodeJS, by default function execution continues until the event loop is empty or the function times out.
create the client outside of handler, set the context.callbackWaitsForEmptyEventLoop = false and don't call client.shutdown.

No Mongo Query gets result when cron is running in background

I have been NodeJS as server side and MongoDB as our database. It really works great together.
Now I have added node-schedule library into our system , to call a function like a cron-job.
The process takes around hours to complete.
My issue is whenever cron is running , all users to my site gets No response fro server i.e database gets locked.
Stuck on the issue from a week , needs good solution to run cron , without affecting users using the site.
Typically you will want to write a worker and run the worker in a different entry point that is not part of your server. There are multiple ways you could achieve this.
1) Write a worker on another server to interact with your database
2) Write a service worker on another server that interacts with your api
3) Use the same server but setup a cronjob to execute the file that does the work at a specified time.
But you should not do this from the same entry point that your server is running on. You need a different execution file.
There is one thing you can do to run this where it will not bog down your server and that would be for your trigger for node-schedule to run a child process. https://nodejs.org/api/child_process.html

How can I "break up" a long running server side function in a Meteor app?

I have, as part of a meteor application, a server side that gets POST messages of information to feed to the web client via inserts/updates to a Collection. So far so good. However, sometimes these updates can be rather large (50K records a go, every 5 seconds). I was having a hard time keeping up to this until I started using batch-insert package and then low-level batch.find.update() and batch.execute() from Mongo.
However, there is still a good amount of processing going on even with 50K records (it does some calculations, analytics, etc). I would LOVE to be able to "thread" that logic so the main event loop can continue along. However, I am not sure there is a real easy way to create "real" threads for this within Meteor. So baring that, I would like to know the best / proper way of at least "batching" the work so that every N (say 1K or so) records I can release the event loop back to process other events (like some client side DDP messages and the like). Then do another 1K records, etc. until however many records as I need are done.
I am THINKING the solution lies within using Fibers/Futures -- which appear to be the Meteor way -- but I am not positive that is correct or the low level ideas like "setTimeout()" and/or "setImmediate()" are more appropriate.
TIA!
Meteor is not a one size fits all tool. I think you should decouple your meteor application from your batch processing. Set up a separate meteor instance, or better yet set up a pure node.js server to handle these requests and batch processes. It would look like this:
Create a node.js instance that connects to the same mongo database using the mongodb plugin (https://www.npmjs.com/package/mongodb).
Use express if you're using node.js to handle the post requests (https://www.npmjs.com/package/express).
Do the batch processing/inserts/updates in this instance.
The updates in mongo will be reflected in meteor very quickly. I had a similar situation and used a node server to do some batch data collection and then pass it into a cassandra database. I then used pig latin to run some batch operations on that data, and then inserted it into mongo. My meteor application would reactively display the new data pretty much instantaneously.
You can call this.unblock() inside a server method to allow the code to run in the background, and immediately return from the method. See example below.
Meteor.methods({
longMethod: function() {
this.unblock();
Meteor._sleepForMs(1000 * 60 * 60);
}
});

How to connect pinoccio to apache couchdb

Is there anyone using the nice pinoccio from www.pinocc.io ?
I want to use it to post data into an apache couchdb using node.js. So I'm trying to poll data from the pinnocio API, but I'm a little lost:
schedule the polls
do long polls
do a completely different approach
Any ideas are welcome
Pitt
Sure. I wrote the Pinoccio API, here’s how you do it
https://gist.github.com/soldair/c11d6ae6f4bead140838
This example depends on the pinoccio npm module ~0.1.3 so make sure to npm install again to pick up the newest version.
you don't need to poll because pinoccio will send you changes as they happen if you have an open connection to either "stats" or "sync". if you want to poll you can but its not "real time".
sync gives you the current state + streams changes as they happen. so its perfect if you
only need to save the changes to your troop while your script is running. or show the current and last known state on a web page.
The solution that replicates every data point we store is stats. This is the example provided. Stats lets you read everything that has happened to a scout. Digital pins for example are the "digital" report. You can ask for data from a specific point in time or just from the current time (default). Changes to this "digital" report will continue streaming live as they happen, until the "end" time is reached, or if "tail" equals 0 in the options passed to stats.
hope this helps. i tested the script on my local couch and it worked well. you would need to modify it to copy more stats from each scout. I hope that soon you will be able to request multiple reports from multiple scouts in the same stream. i just have some bugs to sort out ;)
You need to look into 2 dimensions:
node.js talking to CouchDB. This is well understood and there are some questions you can find here.
Getting the data from the pinoccio. The API suggests that as long as the connection is open, you get data. So use a short timeout and a loop. You might want to run your own node.js instance for that.
Interesting fact: the CouchDB team seems to work on replacing their internal JS engine with node.js

Resources