I can't imagine how can node.js in one single thread execute two scripts with different code simultaneously. For example I have two different scripts A and B. What will happen if almost simultaneously several clients request A and B. For PHP it is understandable, for example, will be created five threads to handle A and five threads to handle B, and for each request script executes again. But what happens in Node.js? Thank you!
It uses the so called event loop, implemented by libuv
A very simple explanation would be: when a new event occurs, it will be put into a queue. Every now and then, the node process will interupt execution to process these events.
The main difference between PHP and node is that a node.js process is essentially a stand-alone web server (single threaded), while PHP is an interpreter that runs within a web server (i.e. Apache), which is responsible for creating new threads for each request.
Node.js is very good for network applications (like web sites) because in these applications most of the work is I/O which node.js handles asynchronously.
Even if two requests arrive at the same time and Node.js only has one single thread of execution, each one of the requests (in sequence) will be handed off to the operating system for I/O (via libuv as mihai pointed out) and the fact that there is only one JavaScript thread of execution becomes irrelevant. As the I/O completes, the JavaScript thread picks up the result and returns a response.
Related
I have a very simple express server with 1 endpoint listening on a certain port.
I'd like a separate scheduled script to make an API call every 10 minutes and save the data locally (in memory).
When the endpoint on my server is hit, I'd like to serve the locally cached data from memory.
How can I make sure that the scheduled cron job does not ever block the processing of requests coming in i.e. both continue to happen at the same time.
I've read about child processes and worker threads, but I'm not sure if this would be a good use case for it.
JavaScript is single-threaded, so worker threads and forking notwithstanding, every line of JavaScript code that executes blocks other lines of code from executing. Only one can execute at a time. So forgetting about the scheduled job for a moment, in an express server, multiple requests happening concurrently will compete with each other for CPU resources and block each other.
However, Node.js has asynchronous I/O. A typical request often looks a bit like this:
Receive request and run a little bit of JavaScript code (maybe 1ms).
Query an external REST API (let's say 50ms)
Run a little more JavaScript code and send a response (maybe 1ms).
In steps 1 and 3, all other requests are blocked from executing JavaScript code. They have to wait until the current request is done executing its code. In step 2 however, other requests are free to execute JavaScript code, one after the other, since the network request is non-blocking.
I explain all this because it sounds like your scheduled job might go through the same three steps mentioned above. If that's the case, then functionally it's not going to block anything any more than simultaneous requests to the server are already blocking each other, and there's little reason to worry about threading or multi-processing unless your server is under such heavy load that it cannot serve all incoming requests.
So this is not a good use case for child processes and worker threads unless the scheduled job has different characteristics than I'm assuming (for example if it spends multiple seconds crunching heavy computations in JavaScript, that would noticeably impact request response time when it runs, and might make sense to break out into a separate process or worker thread).
I've gone through some introductory articles on Node.js and Event Loop and one thing is not clear - if there are multiple concurrent requests then are the responses always sequential in the order the request was made? Say if 20 requests did complete simultaneously then will the 20th response have to wait for the other 19 to be cleared (responded back to the client) ?
Update: What I was wondering is whether this is similar to how multiple setTimeouts get queued?
node.js runs Javascript as single threaded. Thus, only one piece of Javascript is running at any given time.
But, almost all I/O (e.g. networking, file access, etc...) is asynchronous and non-blocking. So, if 20 requests are made of your server in a very short period of time, the first request to reach the server will start executing it's request handler and the other requests will be queued. But, as soon as the first request hits an asynchronous operation (such as reading from the local file system), that request will be suspended while the non-blocking asynchronous I/O is taking place and the next request in line will start to run.
This second request will then run until it either finishes or until it also hits a piece of asynchronous I/O. When that second request is waiting on the async I/O, then another request will get to run. The system scheduler will determine if the next operation is the completion of the async I/O request from the first request or if it will start the third request that was waiting in the queue.
The various requests will continue this way until all are done. Multiple requests may be "in-flight" at the same time (meaning they've been started, but have not completed yet), but only one is ever actually executing code at any given moment.
This is sometimes referred to as cooperative tasking. There is no pre-emptive multi-tasking among the different requests where each automatically gets a time slice of the host CPU. But, any time a request hits an asynchronous I/O operation, then that tells the scheduler that other requests waiting to run can run.
This is all managed from an event queue in node.js. A piece of Javascript runs until it completes. If it makes an asynchronous I/O request and then completes, then another piece of Javascript that is also waiting to run can start to run. When it is done, the JS engine pulls the next item out of the event queue and runs it. That might be a new incoming request or it might be the completion of some asynchronous I/O operation on some other request.
The advantages of this type of system are:
It scales really well, particularly for I/O bound server operations, because you can have many requests "in-flight" at the same time with only a single Javascript thread. The cooperative tasking is very lightweight and fast.
Programming a system like this has far fewer "race conditions" to watch out for because no two pieces of Javascript are ever running at the actual same time. This means you can often share state between requests without ever having to use mutexes (like you would in a multi-thread environment). Since thread-safe bugs are often very difficult to avoid and to test for, it's a major advantage to eliminate these types of bugs.
The cooperative model is conceptually simple and easier to learn and to program safely.
The disadvantages of this type of system are:
It does not share the CPU among tasks that are CPU-bound. A node.js programmer with lots of heavy CPU-bound computations often has to use clustering or child processes to handle the heave CPU computations so as to not over-burden the main request processing Javascript thread with that work and make it too non-responsive.
Clustering of processes is required to maximize the use of multiple processors and then any shared data must be shared across those processes. People often use an in-memory database like Redis to share data between processes.
You can't just willy nilly fire up another Javascript thread to go off and do something.
I have a site that makes the standard data-bound calls, but then also have a few CPU-intensive tasks which are ran a few times per day, mainly by the admin.
These tasks include grabbing data from the db, running a few time-consuming different algorithms, then reuploading the data. What would be the best method for making these calls and having them run without blocking the event loop?
I definitely want to keep the calculations on the server so web workers wouldn't work here. Would a child process be enough here? Or should I have a separate thread running in the background handling all /api/admin calls?
The basic answer to this scenario in Node.js land is to use the core cluster module - https://nodejs.org/docs/latest/api/cluster.html
It is an acceptable API to :
easily launch worker node.js instances on the same machine (each instance will have its own event loop)
keep a live communication channel for short messages between instances
this way, any work done in the child instance will not block your master event loop.
I know Node.js is good at keeping large number of simultaneous persistent connections, for example, a chat room for many many chatters.
I am wondering how it achieves this. I mean anyway it is using TCP/IP which is encapsulated by the underlying OS, why it can handle persistent connections so well that others cannot?
What is the magic thing does it have?
Node.js makes all I/O asynchronous. It only runs in a single thread, but will do other requests or operations while waiting on I/O.
In contrast, classical web servers will not serve another request until the previous one is fully done. For this reason, Apache runs several processes at the same time; let's say there's 10 httpd processes, that normally means 10 requests can be served at any one time (*). If the processes take more time to complete, you will serve less requests - or will have to spawn more processes, even if the process is doing nothing - like waiting for the database to chew up and return data.
A node.js process, faced with a request that will go to the database, leaves the database to work while it goes to serve another request.
*) MPM makes this not quite true, but true enough for all intents and purposes.
Well, the thing is that most web servers (like apache etc.. ) works using thread spawning, where they spwan a new thread for every incoming HTTP request. these threads are synchronous and blocking in nature => which means they will execute the code in the order it is written and any further computation will be blocked by the current I/O or compute task. Like if you want to listen for an event like - chat submission by a chatter you need to have a dedicated thread per user ( per user is necessary for maintaining persistent connection, there are few possible optimization techniques but still you can assume threads to be per user) listening to this event and this thread will be blocked waiting for this event to happen. So for any thread spawning and blocking web-server
Javascript on the other hand is non-blocking ( and conductive to asynchronous codes )by nature => here you register a callback for an event and whenever it occurs some the callback function will be executed. It will not block at any point waiting for this event.
You can find more about this by reading about non-blocking or asynchronous servers.
I don't understand several things about nodejs. Every information source says that node.js is more scalable than standard threaded web servers due to the lack of threads locking and context switching, but I wonder, if node.js doesn't use threads how does it handle concurrent requests in parallel? What does event I/O model means?
Your help is much appreciated.
Thanks
Node is completely event-driven. Basically the server consists of one thread processing one event after another.
A new request coming in is one kind of event. The server starts processing it and when there is a blocking IO operation, it does not wait until it completes and instead registers a callback function. The server then immediately starts to process another event (maybe another request). When the IO operation is finished, that is another kind of event, and the server will process it (i.e. continue working on the request) by executing the callback as soon as it has time.
So the server never needs to create additional threads or switch between threads, which means it has very little overhead. If you want to make full use of multiple hardware cores, you just start multiple instances of node.js
Update
At the lowest level (C++ code, not Javascript), there actually are multiple threads in node.js: there is a pool of IO workers whose job it is to receive the IO interrupts and put the corresponding events into the queue to be processed by the main thread. This prevents the main thread from being interrupted.
Although Question is already explained before a long time, I'm putting my thoughts on the same.
Node.js is single threaded JavaScript runtime environment. Basically it's creator Ryan Dahl concern was that parallel processing using multiple threads is not the right way or too complicated.
if Node.js doesn't use threads how does it handle concurrent requests in parallel
Ans: It's completely wrong sentence when you say it doesn't use threads, Node.js use threads but in a smart way. It uses single thread to serve all the HTTP requests & multiple threads in thread pool(in libuv) for handling any blocking operation
Libuv: A library to handle asynchronous I/O.
What does event I/O model means?
Ans: The right term is non-blocking I/O. It almost never blocks as Node.js official site says. When any request goes to node server it never queues the request. It take request and start executing if it's blocking operation then it's been sent to working threads area and registered a callback for the same as soon as code execution get finished, it trigger the same callback and goes to event queue and processed by event loop again after that create response and send to the respective client.
Useful link:
click here
Node JS is a JavaScript runtime environment. Both browser and Node JS run on V8 JavaScript engine. Node JS uses an event-driven, non-blocking I/O model that makes it lightweight and efficient. Node JS applications uses single threaded event loop architecture to handle concurrent clients. Actually its' main event loop is single threaded but most of the I/O works on separate threads, because the I/O APIs in Node JS are asynchronous/non-blocking by design, in order to accommodate the main event loop. Consider a scenario where we request a backend database for the details of user1 and user2 and then print them on the screen/console. The response to this request takes time, but both of the user data requests can be carried out independently and at the same time. When 100 people connect at once, rather than having different threads, Node will loop over those connections and fire off any events your code should know about. If a connection is new it will tell you .If a connection has sent you data, it will tell you .If the connection isn’t doing anything ,it will skip over it rather than taking up precision CPU time on it. Everything in Node is based on responding to these events. So we can see the result, the CPU stay focused on that one process and doesn’t have a bunch of threads for attention.There is no buffering in Node.JS application it simply output the data in chunks.
Though its been answered , i would like to just share my understandings in simple terms
Nodejs uses a library called Libuv , so this Libuv is written in C
language which uses the concept of threads . These threads are called
as workers and these workers take care of the multiple requests from client.
Parallel processing in nodejs is achieved with the help of 2 concepts
Asynchronous
Non blocking IO