My backend has a few endpoints, most of them return some json to the customer and are pretty fast, however one of them takes a very long time to process.
It takes an image url from the request body, manipulates that image to get a new one, and once the image is processed it uploads it to a server in order to get back a url,
and only then it can use the url to make an order.
Getting the enhanced image and uploading it to the server (to get back the url) take a long time, like a good 3 seconds each if not more. I don't want the "order" endpoint to block the other endpoints, if that is something that would happen.
Each order is independent from the previous or the next one and I don't care how long it takes to process one,
if it means it doesn't distrupt and block the event loop.
For now this is my code:
app.post("/order", async (req,res) => {
AIEnhancedImage = await enhance(req.body.image)
url = await uploadImageToServer(AIEnhancedImage)
order(url)
}
app.get("/A"), async (req,res) => {
...
}
app.get("/B"), async (req,res) => {
...
}
app.get("/C"), async (req,res) => {
...
}
My question is, if another endpoint is hit, will that endpoint be blocked by the "order" one if there is one processing?
If it does, what is a better implementation to make sure the order endpoint is processed bit by bit instead all at once?
This doubt probably arises from my lack of knowledge about the event loop. what I hope is that
the code from the order endpoint will be added to the event loop but be processed indipendently and at the same time as other
requests from other endpoints. The blocking part would only be within that endpoint, so it wouldn;t affect significantly the performance of other endpoints.
The answer is it depends.
Is the code below CPU intensive or IO intensive?
AIEnhancedImage = await enhance(req.body.image)
url = await uploadImageToServer(AIEnhancedImage)
order(url)
Only one active user action can run inside an event loop callback. So if you are doing some cpu intensive task than nothing else can run on unless that task finishes.
Think it like this.. what ever custom code you write only one thing can run at a time.
But if you are doing IO based task then node JS will use special worker pool to process and wait for IO. So while Node JS waits for IO, node JS will pick something else in event loop and try to process it.
https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/
I have a websocket server in node.js which allows users to solve a given puzzle.
I also have a code that generates random puzzle for about 20 seconds. In the meantime I still want to handle new connections/disconnects, but this synchronous code blocks the event loop.
Here's the simplified code:
io.on('connection', socket => {
//
});
io.listen(port);
setInterval(function() {
if (game.paused)
game.loadRound();
}, 1000);
loadRound runs about 20 seconds, that blocks all connections and setInterval itself
What would be the correct way to run this code without blocking event loop?
You have three basic choices:
Redesign loadRound() so that it doesn't block the event loop. Since you've shared none of the code for it, we can't advise on the feasibility of that, but if it's doing any I/O, then it does not need to block the event loop. Even if it's all just CPU work, it could be designed to do its job in small chunks to allow the event loop some cycles, but often that's more work to redesign it that way than options 2 and 3 below.
Move loadRound() to a worker thread (new in node.js) and communicate the result back via messaging.
Move loadRound() to a separate node.js process using the child_process module and communicate the result back via any number of means (stdio, messaging, etc...).
I'm working on a project where I have a Feeds page which shows all type of post by all users. Type specific list page and Posts' detail page.
Pages-
1. Feeds
2. List (Type specific)
3. Detail (detail of a post)
So I have following Mongo collection -
1. Feed
2. type1 post
3. type2 post
4. type3...
Now when a user post a new Post I save it to respective collection lets say to 'type1 post' and return success to browser. But I also want to update my 'Feed' collection with same data. I don't want it to be done before response is send. Because that will increase User's wait time. Hence I have used Events. Here's my code -
const emitter = new event.EventEmitter();
function savePost(){
// Code to save data to Mongo collection
emitter.emit('addToFeeds', data);
console.log('emit done');
return res.json(data);
}
emitter.on('addToFeeds', function(data){
// code to save data to Feeds collection
console.log('emitter msg - ', data);
});
Now when I check the console.log output, it shows "emitter msg -" first and then "emit done". That's why I'm assuming emitter.on code is executing before res.json(data);
Now I'm wondering does Events are blocking code? If I have to update Feeds in background or after response is sent what is the right way? In future I also want to implement caching so I also have to update cache when even a post is added, that too I want to do after response is sent or in background.
Yes, events are synchronous and blocking. They are implemented with simple function calls. If you look at the eventEmitter code, to send an event to all listeners, it literally just iterates through an array of listeners and calls each listener callback, one after the other.
Now I'm wondering does Events are blocking code?
Yes. In the doc for .emit(), it says this: "Synchronously calls each of the listeners registered for the event named eventName, in the order they were registered, passing the supplied arguments to each."
And, further info in the doc in this section Asynchronous vs. Synchronous where it says this:
The EventEmitter calls all listeners synchronously in the order in which they were registered. This is important to ensure the proper sequencing of events and to avoid race conditions or logic errors. When appropriate, listener functions can switch to an asynchronous mode of operation using the setImmediate() or process.nextTick() methods:
If I have to update Feeds in background or after response is sent what is the right way?
Your eventListener can schedule when it wants to actually execute its code with a setTimeout() or a setImmediate() or process.nextTick() if it wants the other listeners and other synchronous code to finish running before it does its work. So, you register a normal listener (which will get called synchronously) and then inside that, you can use a setTimeout() or setImmediate() or process.nextTick() and put the actual work inside that callback. This will delay running your code until after the current Javascript that triggered the initial event is done running.
There is no actual "background processing" in node.js for pure Javascript code. node.js is single threaded so while you're running some Javascript, no other Javascript can run. Actual background processing would have to be done either with existing asynchronous operations (that use native code to run things in the background) such as network I/O or disk I/O) or by running another process to do the work (that other process can be any type of code including another node.js process).
Events are synchronous and will block. This is done so that you can bind events in a specific order and cascade them in that order. You can make each item asynchronous, and if you're making HTTP requests at those events, that's going to happen async, but the events themselves are started synchronously.
See: https://nodejs.org/api/events.html#events_emitter_emit_eventname_args
And: https://nodejs.org/api/events.html#events_asynchronous_vs_synchronous
In my Meteor application to implement a turnbased multiplayer game server, the clients receive the game state via publish/subscribe, and can call a Meteor method sendTurn to send turn data to the server (they cannot update the game state collection directly).
var endRound = function(gameRound) {
// check if gameRound has already ended /
// if round results have already been determined
// --> yes:
do nothing
// --> no:
// determine round results
// update collection
// create next gameRound
};
Meteor.methods({
sendTurn: function(turnParams) {
// find gameRound data
// validate turnParams against gameRound
// store turn (update "gameRound" collection object)
// have all clients sent in turns for this round?
// yes --> call "endRound"
// no --> wait for other clients to send turns
}
});
To implement a time limit, I want to wait for a certain time period (to give clients time to call sendTurn), and then determine the round result - but only if the round result has not already been determined in sendTurn.
How should I implement this time limit on the server?
My naive approach to implement this would be to call Meteor.setTimeout(endRound, <roundTimeLimit>).
Questions:
What about concurrency? I assume I should update collections synchronously (without callbacks) in sendTurn and endRound (?), but would this be enough to eliminate race conditions? (Reading the 4th comment on the accepted answer to this SO question about synchronous database operations also yielding, I doubt that)
In that regard, what does "per request" mean in the Meteor docs in my context (the function endRound called by a client method call and/or in server setTimeout)?
In Meteor, your server code runs in a single thread per request, not in the asynchronous callback style typical of Node.
In a multi-server / clustered environment, (how) would this work?
Great question, and it's trickier than it looks. First off I'd like to point out that I've implemented a solution to this exact problem in the following repos:
https://github.com/ldworkin/meteor-prisoners-dilemma
https://github.com/HarvardEconCS/turkserver-meteor
To summarize, the problem basically has the following properties:
Each client sends in some action on each round (you call this sendTurn)
When all clients have sent in their actions, run endRound
Each round has a timer that, if it expires, automatically runs endRound anyway
endRound must execute exactly once per round regardless of what clients do
Now, consider the properties of Meteor that we have to deal with:
Each client can have exactly one outstanding method to the server at a time (unless this.unblock() is called inside a method). Following methods wait for the first.
All timeout and database operations on the server can yield to other fibers
This means that whenever a method call goes through a yielding operation, values in Node or the database can change. This can lead to the following potential race conditions (these are just the ones I've fixed, but there may be others):
In a 2-player game, for example, two clients call sendTurn at exactly same time. Both call a yielding operation to store the turn data. Both methods then check whether 2 players have sent in their turns, finding the affirmative, and then endRound gets run twice.
A player calls sendTurn right as the round times out. In that case, endRound is called by both the timeout and the player's method, resulting running twice again.
Incorrect fixes to the above problems can result in starvation where endRound never gets called.
You can approach this problem in several ways, either synchronizing in Node or in the database.
Since only one Fiber can actually change values in Node at a time, if you don't call a yielding operation you are guaranteed to avoid possible race conditions. So you can cache things like the turn states in memory instead of in the database. However, this requires that the caching is done correctly and doesn't carry over to clustered environments.
Move the endRound code outside of the method call itself, using something else to trigger it. This is the approach I've taken which ensures that only the timer or the final player triggers the end of the round, not both (see here for an implementation using observeChanges).
In a clustered environment you will have to synchronize using only the database, probably with conditional update operations and atomic operators. Something like the following:
var currentVal;
while(true) {
currentVal = Foo.findOne(id).val; // yields
if( Foo.update({_id: id, val: currentVal}, {$inc: {val: 1}}) > 0 ) {
// Operation went as expected
// (your code here, e.g. endRound)
break;
}
else {
// Race condition detected, try again
}
}
The above approach is primitive and probably results in bad database performance under high loads; it also doesn't handle timers, but I'm sure with some thinking you can figure out how to extend it to work better.
You may also want to see this timers code for some other ideas. I'm going to extend it to the full setting that you described once I have some time.
Is the Node.js I/O event loop single- or multithreaded?
If I have several I/O processes, node puts them in an external event loop. Are they processed in a sequence (fastest first) or handles the event loop to process them concurrently (...and in which limitations)?
Event Loop
The Node.js event loop runs under a single thread, this means the application code you write is evaluated on a single thread. Nodejs itself uses many threads underneath through libuv, but you never have to deal with with those when writing nodejs code.
Every call that involves I/O call requires you to register a callback. This call also returns immediately, this allows you to do multiple IO operations in parallel without using threads in your application code. As soon as an I/O operation is completed it's callback will be pushed on the event loop. It will be executed as soon as all the other callbacks that where pushed on the event loop before it are executed.
There are a few methods to do basic manipulation of how callbacks are added to the event loop.
Usually you shouldn't need these, but every now and then they can be useful.
setImmediate
process.nextTick
At no point will there ever be two true parallel paths of execution, so all operations are inherently thread safe. There usually will be several asynchronous concurrent paths of execution that are being managed by the event loop.
Read More about the event loop
Limitations
Because of the event loop, node doesn't have to start a new thread for every incoming tcp connection. This allows node to service hundreds of thousands of requests concurrently , as long as you aren't calculating the first 1000 prime numbers for each request.
This also means it's important to not do CPU intensive operations, as these will keep a lock on the event loop and prevent other asynchronous paths of execution from continuing.
It's also important to not use the sync variant of all the I/O methods, as these will keep a lock on the event loop as well.
If you want to do CPU heavy things you should ether delegate it to a different process that can execute the CPU bound operation more efficiently or you could write it as a node native add on.
Read more about use cases
Control Flow
In order to manage writing many callbacks you will probably want to use a control flow library.
I believe this is currently the most popular callback based library:
https://github.com/caolan/async
I've used callbacks and they pretty much drove me crazy, I've had much better experience using Promises, bluebird is a very popular and fast promise library:
https://github.com/petkaantonov/bluebird
I've found this to be a pretty sensitive topic in the node community (callbacks vs promises), so by all means, use what you feel will work best for you personally. A good control flow library should also give you async stack traces, this is really important for debugging.
The Node.js process will finish when the last callback in the event loop finishes it's path of execution and doesn't register any other callbacks.
This is not a complete explanation, I advice you to check out the following thread, it's pretty up to date:
How do I get started with Node.js
From Willem's answer:
The Node.js event loop runs under a single thread. Every I/O call requires you to register a callback. Every I/O call also returns immediately, this allows you to do multiple IO operations in parallel without using threads.
I would like to start explaining with this above quote, which is one of the common misunderstandings of node js framework that I am seeing everywhere.
Node.js does not magically handle all those asynchronous calls with just one thread and still keep that thread unblocked. It internally uses google's V8 engine and a library called libuv(written in c++) that enables it to delegate some potential asynchronous work to other worker threads (kind of like a pool of threads waiting there for any work to be delegated from the master node thread). Then later when those threads finish their execution they call their callbacks and that is how the event loop is aware of the fact that the execution of a worker thread is completed.
The main point and advantage of nodejs is that you will never need to care about those internal threads and they will stay away from your code!. All the nasty sync stuff that should normally happen in multi threaded environments will be abstracted out by nodejs framework and you can happily work on your single thread (main node thread) in a more programmer friendly environment (while benefiting from all the performance enhancements of multiple threads).
Below is a good post if anyone is interested:
When is the thread pool used?
you have to know first about nodeJs implementaion in order to know event loop.
actually node js core implementation using two components :
v8 javascript runtime engine
libuv for handlign non i/o blocking operation and handling threads and concurrent operations for you;
with the javascript you can actually write code with one thread but this means not that your code execute on the one thread although you can execute on multiple thread s using clusters in node js
now when you want to execute some code like :
let fs = require('fs');
fs.stat('path',(err,stat)=>{
//do something with the stat;
console.log('second');
});
console.log('first');
the execution of this code at high level is like this:
first the v8 engine run this code and then if there is no error
everything is good then it looks for the
it try to run it run line by line when it gets to the fs .stats this is a node js api very similar to the web apis like setTimeout that the browser handle it for us when it encounter to the fs.stats it is pass the code to the libuv components with a flag and pass your callback to the event queue then the libuv you execute your code during the operation and when its done just send some signal and then d the v8 execute your code az a callback you set on the queue but it always check for the stack is empty then go for the your code on the queue # always remember that !
Well, to understand nodejs I/O events in the event, you must understand nodejs event loop properly.
from the name event loop, we understand it's a loop that runs cycle after cycle round-robin basis until there are no events remains in the loop or the app closed.
The event loop is one of the topmost features in nodejs, it is what makes async programming in nodejs.
When the program starts we are in a node process in the single thread where the event loop runs. Now the most importing things we need to know that the event loop is where all the application code that is inside callback functions is executed.
So, basically all code that is not top-level code will run in the event loop. Some part (mostly heavy duties) might get offloaded to the thread pool
(When is the thread pool used?), the event loop will take care of those heavy duties and return the result to the event of the event loop.
It is the heart of the node architecture, and nodejs built around callback functions. so callbacks will triggered as soon as some work is finished sometime in the future because node uses an event-triggered architecture.
When an application receives an HTTP request on a node server or a timer expiring or a file finishing to read all these will emit events as soon as they are done with their work, and our event loop will then pick up these events and call the callback functions that are associated with each event, it's usually said that the event loop does the orchestration, which simply means that it receives events, calls their callback functions, and offloads the more expensive tasks to the thread pool.
Now, how does all this actually work behind the scenes? In what order are these callbacks executed?
Well, when we start our node application, the event loop starts running right away. An event loop has multiple phases, and each phase has a callback queue, where the four most important phases are 1. Expired timer callbacks, 2.I/O polling and callbacks 3. setImmediate callbacks, and 4. Close callbacks. There are other phases that is used internally by Node.
So, the first phase takes care of callbacks of expired timers, for example, from the setTimeout() function. So, if there are callback functions from timers that just expired, these are the first ones to be processed by the event loop.
** The most important thing is, If a timer expires later during the time when one of the other phases is being processed, well then the callback of that timer will only be called as soon as the event loop comes back to this first phase. And it works like this in all four phases.**
So callbacks in each queue are processed one by one until there are no ones left in the queue and only then, the event loop will enter the next phase. for example, suppose there is 1000 setTimeOut callbacks timer expired and the event loop is in the first phase then all these 1000 setTimeOuts callbacks will execute one by one then it will go to the next phase(I/O pooling and callbacks).
Next up, we have I/O pooling and execution of I/O callbacks. Here I/O stands for input/output and polling basically means looking for new I/O events that are ready to be processed and putting theme into the callback queue.
In the context of a Node application, I/O means mainly stuff like networking and file access, so in this phase where probably 99% of general application code gets executed.
The next phase is for setImmediate callbacks, and SetImmediate is a special kind of timer that we can use if we want to process callbacks immediately after the I/O polling and execution phase.
And finally, the fourth phase is the close callbacks, in this phase, all close events are processed, for example when a server or a WebSocket shut down.
These are the four phases in the event loop, but besides these four callbacks queues there are actually also two other queues,
1. nextTick() other
2. microtasks queue(which is mainly for resolved promises)
If there are any callbacks in one of these two queues to be processed, they will be executed right after the current phase of the event loop finishes instead of waiting for the entire loop/cycle to finish.
In other words, after each of these four phases, if there are any callbacks in these two special queues, they will be executed right away. Now imagine that a promise resolves and returns some data from an API call while the callback of an expired timer is running, In this case, the promise callback will be executed right after the one from the timer finish.
The same logic also applies to the nextTick() queue. The nextTick() is a function that we can use when we really, really need to execute a certain callback right after the current event loop phase. It's a bit similar to setImmediate, with the difference that setImmediate only runs after the I/O callback phase.
Will all the above things can happen in one tick/cycle of the event loop, In the meantime their new events could have arisen in a particular phase or old event could be expired, the event loop will handle those events with another new cycle.
So now it's time to decide whether the loop should continue to the next tick or if the program should exit. Node simply checks whether there are any timers or I/O tasks that are still running in the background if there aren't any then it will exit the application. But if there are any pending timers or I/O tasks, then the node will continue running the event loop and go starting to the next cycle.
For example, in node application when we are listening for incoming HTTP requests, we basically running an infinite I/O task, and that is run in the event loop, for that Node.js keep running and keep listening for new HTTP request coming in instead of just exiting the application.
Also when we are writing or reading a file in the background that's also an I/O task and it makes sense that the app doesn't exist while it's working with that file, right?
Now The event loop in practices:
const fs = require('fs');
setTimeout(()=>console.log('Timer 1 finished'), 0);
fs.readFile('test-file.txt', ()=>{
console.log('I/O finished');
});
setImmediate(()=>console.log('Immediate 1 finished'))
console.log('Hello from the top level code');
Output:
Well the first lin is Hello from the top level code, yes it is expected because this is a code that gets executed immediately. Then after we have three output, Timer 1 finished this line is expected because of phase one as we discuess before, but after that I/O finished should be printed, because we discuess that setImmediate runs after the I/O callback phase, but this code is actually not in an I/O cycle, so it is not running inside of the event loop, because it's not runnin inside of any callback function.
Now lets do another test:
const fs = require('fs');
setTimeout(()=>console.log('Timer 1 finished'), 0);
setImmediate(()=>console.log('Immediate 1 finished'));
fs.readFile('test-file.txt', ()=>{
console.log('I/O finished');
setTimeout(()=>console.log('Timer 2 finished'), 0);
setImmediate(()=>console.log('Immediate 2 finished'));
setTimeout(()=>console.log('Timer 3 finished'), 0);
setImmediate(()=>console.log('Immediate 3 finished'));
});
console.log('Hello from the top level code')
Output:
The output is as expected right? Now let's add some delay:
setTimeout(()=>console.log('Timer 1 finished'), 0);
setImmediate(()=>console.log('Immediate 1 finished'));
fs.readFile('test-file.txt', ()=>{
console.log('I/O finished');
setTimeout(()=>console.log('Timer 2 finished'), 3000);
setImmediate(()=>console.log('Immediate 2 finished'));
setTimeout(()=>console.log('Timer 3 finished'), 0);
setImmediate(()=>console.log('Immediate 3 finished'));
});
console.log('Hello from the top level code')
output:
In the first cycle inside I/O everything executed, but because of the dealy Timer-2 executed inside its code in the second cycle.
Now Lets add nextTick(), and see how nodejs behaves:
setTimeout(()=>console.log('Timer 1 finished'), 0);
setImmediate(()=>console.log('Immediate 1 finished'));
fs.readFile('test-file.txt', ()=>{
console.log('I/O finished');
setTimeout(()=>console.log('Timer 2 finished'), 3000);
setImmediate(()=>console.log('Immediate 2 finished'));
setTimeout(()=>console.log('Timer 3 finished'), 0);
setImmediate(()=>console.log('Immediate 3 finished'));
process.nextTick(()=>console.log('Process Next Tick'));
});
console.log('Hello from the top level code')
Output:
Well, the first callback is executed is inside the process.NextTick(), as it is expected right? Because nextTicks callbacks stays in the microtask queue an they executed after each phase.
If you run this simple node code
console.log('starting')
setTimeout(()=>{
console.log('0sec')
}, 0)
setTimeout(()=>{
console.log('2sec')
}, 2000)
console.log('end')
What do you expect output to be?
If its,
starting
0sec
end
2sec
it's is wrong guess, we will get
starting
end
0sec
2sec
because node will never print code in event loop before exiting main()
So basically, First main() will go in stack, then console.log('starting ') so you will see it printed first, after that come setTimeout(()=>{console.log('0sec')}, 0) will go in a stack and then in nodeAPI (node uses multi-threads (lib written in c++) to execute setTimeout to finish, even tho above code is single thread code) after time is up it moves to the event loop, now node can't print it unless stack is not empty. So, next line i.e setTimeout of 2sec will be first pushed to stack,then nodeAPI which will wait for 2 sec to finish, and then to even loop, in mean while next code line will be executed that is console.log('end') and so we see end msg before 0sec, because if nodes non blocking nature. After end code is over and hence main is poped out and its turn of event loop code to be executed that is first 0sec and after that 2sec msg will be printed.