I got several applications working with Node on the back-end and React on the front-end, it works great, I do axios get and post requests from React to Express and I get data back and forth, then on production I use pm2 to get everything up and running.
My question is when two users access the same application at the same time, how does Node treat this, as two separated instances or just one?.
I am considering using socket.io to be able to notify the front-end on changes that are happening on Node, and I wonder if those notifications will be emitted from the back-end no matter what another user might be doing or not.
Thanks.
As you have probably heard node.js is addressed as a "single-threaded" runtime. This is only partially true. Even though node runs on a single thread of your processor it runs the majority of its tasks in a thread pool which can process up to 4 tasks at the same time.
If you want to know about this you might want to look into the node event loop which describes the steps node goes through on each "tick".
So as you see node can often not process one but up to 4 actions on each loop cycle. But there is more, to solve the performance issues that might occur on big applications you can run node on a cluster mode. This allows you to extend the thread pool and add multiple node instances and therefore handle high demand efficiently.
One note to your socket.io question. As you see a high demand of tasks gets queued until it is handled in the node event loop, so sometimes you need to wait. Fortunatly we are in a race of big tech to create the fastest JS-runtime so this thing is pretty fast.
Related
In Node.js cluster mode, if multiple jobs exist in the event loop for one process, should the current job crash the process, what happens to the remaining job?
I'm assuming the remaining jobs in the event loop would go unfulfilled or return a server error. My question is, why is this an acceptable risk? Why would someone opt to use Node.js cluster mode in production then, rather than use something like PHP in production, where there is no risk of this, because PHP handles each request in its own process.
Edit:
Obviously this doesn't just apply to Node.js cluster mode. It can happen on a single instance, in which case obviously the end user would just get a server error. Cluster mode just happens to be my personal use case.
I'm looking for a way to pick back up a job in the queue job should a previous job cause the process to exit, before the subsequent job gets a change to be fulfilled. I am currently reading about how you can use a tool like RabbitMQ to handle your job queue outside of the node.js cluster, and each cluster instance just pulls jobs from the RabbitMQ queue. If anyone has any input on that, that would also be greatly appreciated.
If multiple jobs exist in the event loop for one process. What happens to the remaining jobs if the current job crashes the process?
If a node.js process crashes, the same thing happens to it that happens to any other process. All open sockets get automatically disconnected and the client will receive an immediate close on their socket (socket connection dropped essentially).
If you were using a Java server that was in the middle of handling 10 requests (perhaps in threads) and it crashed, the consequences would be the same. All 10 socket connections would get dropped.
If process isolation from one request to another is your #1 criteria for selecting a server environment, then I guess you wouldn't pick any environment that ever serves multiple requests from the same process. But, you would give up a lot of get that. One of the reasons for the node.js design is that is scales really, really well for a high number of concurrent connections that are all doing mostly I/O things (disk, networking, database stuff, etc...) which happens to be most web servers. Whereas a design that fires up a new process for every incoming connection does not scale as well for a large number of concurrent connections because a process is a much more heavy-weight thing in the eyes of the operating system (memory usage, other system resource usage, task switching overhead, etc...) than the way node.js does things.
And, there are obviously hundreds of other considerations too when choosing a server environment. So, you kind of have to look at the whole picture of what you're designing for and make the best set of tradeoffs.
In general, I wouldn't put this issue anywhere on the radar for why you should choose one over the other unless you expect to be running risky code (perhaps out of your control) that crashes a lot and this issue is therefore more important in your deployment than all the other differences. And, if that was the case, I'd probably isolate the risky code to its own process (even when using nodejs) to alleviate any pain from that crash. You could have a process pool waiting to process risky things. For example, if you were running code submitted by a user, I might run that code in its own isolated VM.
If you're just worried about your own code crashing a lot, then you probably have bigger problems and need more extensive unit testing, more robust error handling and need to take advantage of other tools just as a linter and other code analysis tools to find potential problem areas. With proper design, implementation and error handling, you should be able to keep a single incoming request from harming anything other than itself. That's certainly the philosophy that every server environment that serves multiple requests from the same process advises and the people/companies deploying those servers use.
I am using Node js for creating a REST API.
In my scenario i have two API's.
API 1 --> Have to get 10,000 records and make a iteration to modify some of the data
API 2: Simple get method.
When i open post man and hit the first API and Second API parallel
Because of Node JS is single threaded Which Causes second API slower for getting response.
My Expectation:
Even though the 1st API getting time it should not make the 2nd API for large time.
From Node JS docs i have found the clustering concept.
https://nodejs.org/dist/latest-v6.x/docs/api/cluster.html
So i implemented Cluster it created 4 server instance.
Now i hit the API 1 in one tab and API 2 in second tab it worked fine.
But when i opened API 1 in 4 tabs and 5th tab again API 2 which causes the slowness again.
What will be the best solution to solve the issue?
Because of the single threaded nature of node.js, the only way to make sure your server is always responsive to quick requests such as you describe for API2 is to make sure that you never have any long running operations in your server.
When you do encounter some operation in your code that takes awhile to run and would affect the responsiveness of your server, your options are as follows:
Move the long running work to a new process. Start up a new process and run the length operation in another process. This allows your server process to stay active and responsive to other requests, even while the long running other process is still crunching on its data.
Start up enough clusters. Using the clustering you've investigated, start up more clusters than you expect to have simultaneous calls to your long run process. This allows there to always be at least one clustered process that is available to be responsive. Sometimes, you cannot predict how many this will be or it will be more than you can practically create.
Redesign your long running process to execute its work in chunks, returning control to the system between chunks so that node.js can interleave other work it is trying to do with the long running work. Here's an example of processing a large array in chunks. That post was written for the browser, but the concept of not blocking the event loop for too long is the same in node.js.
Speed up the long running task. Find a way to speed up the long running job so it doesn't take so long (using caching, not returning so many results at once, faster way to do it, etc...).
Create N worker processes (probably one less worker process than the number of CPUs you have) and create a work queue for the long running tasks. Then, when a long running request comes in, you insert it in the work queue. Then, each worker process is free to work on items in the queue. When more than N long tasks are being requested, the first ones will get worked on immediately, the later ones will wait in the queue until there is a worker process available to work on them. But, most importantly, your main node.js process will stay free and responsive for regular requests.
This last option is the most foolproof because it will be effective to any number of long running requests, though all of the schemes can help you.
Node.js actually is not multi-threaded, so all of these requests are just being handled in the event loop of a single thread.
Each Node.js process runs in a single thread and by default it has a memory limit of 512MB on 32 bit systems and 1GB on 64 bit systems.
However, you can split a single process into multiple processes or workers. This can be achieved through a cluster module. The cluster module allows you to create child processes (workers), which share (or not) all the server ports with the main Node process.
You can invoke the Cluster API directly in your app, or you can use one of many abstractions over the API
https://nodejs.org/api/cluster.html
I have a simple nodejs webserver running, it:
Accepts requests
Spawns separate thread to perform background processing
Background thread returns results
App responds to client
Using Apache benchmark "ab -r -n 100 -c 10", performing 100 requests with 10 at a time.
Average response time of 5.6 seconds.
My logic for using nodejs is that is typically quite resource efficient, especially when the bulk of the work is being done by another process. Seems like the most lightweight webserver option for this scenario.
The Problem
With 10 concurrent requests my CPU was maxed out, which is no surprise since there is CPU intensive work going on the background.
Scaling horizontally is an easy thing to, although I want to make the most out of each server for obvious reasons.
So how with nodejs, either raw or some framework, how can one keep that under control as to not go overkill on the CPU.
Potential Approach?
Could accepting the request storing it in a db or some persistent storage and having a separate process that uses an async library to process x at a time?
In your potential approach, you're basically describing a queue. You can store incoming messages (jobs) there and have each process get one job at the time, only getting the next one when processing the previous job has finished. You could spawn a number of processes working in parallel, like an amount equal to the number of cores in your system. Spawning more won't help performance, because multiple processes sharing a core will just run slower. Keeping one core free might be preferred to keep the system responsive for administrative tasks.
Many different queues exist. A node-based one using redis for persistence that seems to be well supported is Kue (I have no personal experience using it). I found a tutorial for building an implementation with Kue here. Depending on the software your environment is running in though, another choice might make more sense.
Good luck and have fun!
I am in the process of beginning to write a worker queue for node using node's cluster API and mongoose.
I noticed that a lot of libs exist that already do this but using redis and forking. Is there a good reason to fork versus using the cluster API?
edit and now i also find this: https://github.com/xk/node-threads-a-gogo -- too many options!
I would rather not add redis to the mix since I already use mongo. Also, my requirements are very loose, I would like persistence but could go without it for the first version.
Part two of the question:
What are the most stable/used nodejs worker queue libs out there today?
Wanted to follow up on this. My solution ended up being a roll your own cluster impl where some of my cluster workers are dedicated job workers (ie they just have code to work on jobs).
I use agenda for job scheduling.
Cron type jobs are scheduled by the cluster master. The rest of the jobs are created in the non-worker clusters as they are needed. (verification emails etc)
Before that I was using kue but dropped it because the rest of my app uses mongodb and I didnt like having to use redis just for job scheduling.
Have u tried https://github.com/rvagg/node-worker-farm?
It is very light weight and doesn't require a separate server.
I personally am partial to cluster-master.
https://github.com/isaacs/cluster-master
The reason I like cluster master is because it does very little besides add in logic for forking your process, and give you the ability to manage the number of process you're running, and a little bit of logging/recovery to boot! I find overly bloated process management libraries tend to be unstable, and sometimes even slow things down.
This library will be good for you if the following are true:
Your module is largely asynchronous
You don't have a huge amount of different types of events triggering
The events that fire have small amounts of work to do, but you have lots of similar events firing(things like web servers)
The reason for the above list, is the reason why threads-a-gogo may be good for you, for the opposite reasons. If you have a few spots in your code, where there is a lot of work to do within your event loop, something like threads-a-gogo that launches a "thread" specifically for this work is awesome, because you aren't determining ahead of time how many workers to spawn, but rather spawning them to do work when needed. Note: this can also be bad if there is the potential for a lot of them to spawn, if you start launching too many processes things can actually bog down, but I digress.
To summarize, if your module is largely asynchronous already, what you really want is a worker pool. To minimize the down time when your process is not listening for events, and to maximize the amount of processor you can use. Unless you have a very busy syncronous call, a single node event loop will have troubles taking advantage of even a single core of a processor. Under this circumstance, you are best off with cluster-master. What I recommend is doing a little benchmarking, and see how much of a single core your program can use under the "worst case scenario". Let's say this is 33% of one core. If you have a quad core machine, you then tell cluster master to launch you 12 workers.
Hope this helped!
So the first app that people usually build with SocketIO and Node is usually a chatting app. This chatting app basically has 1 Node server that will broadcast to multiple clients. In the Node code, you would have something like.
//Psuedocode
for(client in clients){
if(client != messageSender){
user.send(message);
}
}
This is great for a low number of users, but I see a problem with this. First of all, there is a single point of failure which is the Node server. Second of all, the app will slow down as the number of clients grow. What is there to do then when we reach this bottleneck? Is there an architecture (horizontal/vertical scaling) that can be used to alleviate this problem?
For that "one day" when your chat app needs multiple, fault-tolerant node servers, and you want to use socket.io to cross communicate between the server and the client, there is a node.js module that fits the bill.
https://github.com/hookio/hook.io
It's basically an event emitting framework to cross communicate between multiple "things" -- such as multiple node servers.
It's relatively complicated to use, compared to most modules, which is understandable since this is a complex problem to solve.
That being said, you'd probably have to have a few thousand simultaneous users and lots of other problems before you begin to have problems with this.
Another thing you can do, is try to develop your application in a way so that if a connection is lost (which happens all the time anyway), eg. server goes down, client has network issues (eg. mobile user), etc, your application should be able to handle that and recover from such issues gracefully.
Since Node.js has a single event-loop thread, this single point of failure is written into its DNA. Even reloading a server after code changes require this thread to be stopped.
There are however a lot of tools available to handle such failures gracefully. You could use forever; a simple CLI tool for ensuring that a given script runs continuously. Other options include distribute and up. Distribute is a load balancing middleware for Node. Up builds on top of Distribute to offer zero downtime reloads using either a JavaScript API or command line interface:
Further reading I find you just need to use Redis Store with Socket.io to maintain connection references between two or more processes/ servers. These options have already been discussed extensively here and here.
There's also the option of using socket.io-clusterhub if you don't intend to use the Redis store.