NodeJs: How to handle a very high amount of timers?

NodeJs: How to handle a very high amount of timers? - node.js

I am using socket.io to send packets via websockets. They seem to disappear from time to time. So I have to implement some sort of acknowledge-system. My Idea was to immediatelly respond to a packet with an ACK-packet. If the server does not receive this ACK-packet within a given time, he will resend it (up to 3 times, then disconnect the socket).
My first thought was to start a timer (setTimeout) after sending a packet. if the timeout-event occurs, the packet has to be sent again. If the ACK will arrive, the timeout will get deleted. Quite easy and short.
var io = require('socket.io').listen(80);
// ... connection handling ...
function sendData(someData, socket) {
// TODO: Some kind of counter to stop after 3 tries.
socket.emit("someEvent", someData);
var timeout = setTimeout(function(){ sendData(someData, socket); }, 2000);
socket.on("ack", function(){
// Everything went ok.
clearTimeout(timeout);
});
}
But I will have 1k-3k clients connected with much traffic. I can't imagine, that 10k timers running at the same time are handlable by NodeJS. Even worse: I read that NodeJS will not fire the event if there is no time for it.
How to implement a good working and efficient packet acknowledge system?

If socket.io is not reliable enough for you, you might want to consider implementing your own websocket interface instead of adding a layer on top of socket.io. But to answer your question, I don't think running 10k timers is going to be a big deal. For example, the following code ran in under 3 seconds for me and printed out the expected result of 100000:
var x = 0;
for (var i = 0; i < 100000; i++) {
setTimeout(function() { x++; }, 1000);
}
setTimeout(function() { console.log(x); }, 2000);
There isn't actually that much overhead for a timeout; it essentially just gets put in a queue until it's time to execute it.

I read that NodeJS will not fire the event if there is no time for it.
This is a bit of an exaggeration, node.js timers are reliable. A timer set by setTimeout will fire at some point. It may be delayed if the process is busy at the exact scheduled time, but the callback will be called eventually.
Quoted from Node.js docs for setTimeout:
The callback will likely not be invoked in precisely delay milliseconds. Node.js makes no guarantees about the exact timing of when callbacks will fire, nor of their ordering. The callback will be called as close as possible to the time specified.

Related

Do I need to clear server side intervals?

I have a server side function to check if a player is idle
socket.idle = 0;
socket.cached = {};
socket.checkIdle = function(){
socket.idle++;
console.log(socket.idle);
if(socket.cached.x != players[id].x || socket.cached.y != players[id].y){
socket.idle=0;
}
socket.cached = players[id];
if(socket.idle>12){
socket.disconnect();
}
}
socket.interval = setInterval(socket.checkIdle,1000);
I've noticed that even after the player gets booted/disconnected for being too long. The server still console log the socket.idle for it.
Am I going about this the wrong way? Also should I then clear the interval for when the player disconnects?
socket.on('disconnect', function(){
clearInterval(socket.interval);
});

You certainly shouldn't leave a setInterval() running for a player that is no longer connected. Your server will just have more and more of these running, wasting CPU cycles and possibly impacting your server's responsiveness or scalability.
I've noticed that even after the player gets booted/disconnected for being too long. The server still console log the socket.idle for it.
Yeah, that's because the interval is still running and, in fact, it even keeps the socket object from getting garbage collected. All of that is bad.
Also should I then clear the interval for when the player disconnects?
Yes, when a socket disconnects, you must clear the interval timer associated with that socket.
Am I going about this the wrong way?
If you keep with the architecture of a polling interval timer for each separate socket, then this is the right way to clear the timer when the socket disconnects.
But, I would think that maybe you could come up with another design that doesn't need to regularly "poll" for idle at all. It appears you want to have a 12 second timeout such that if the player hasn't moved within 12 seconds, then you disconnect the user. There's really no reason to check for this every second. You could just set a single timer with setTimeout() for 12 seconds from when the user connects and then each time you get notified of player movement (which your server must already being notified about since you're referencing it in players[id].x and players[id].y), you just clear the old timer and set a new one. When the timer fires, you must have gone 12 seconds without motion and you can then disconnect. This would be more typically how a timeout-type timer would work.

Close a Redis connection based on Node.js Event loop

Issue/ How I ran into it
I'm writing a batch processing Lambda function in Node.JS that makes calls to Redis in two different places. Some batch items may never reach the second Redis call. This is all happening asynchronously, so if i close the connection as soon as the batch queue is empty, any future Redis calls would fail. How do I close the connection?
What i've tried
process.on('beforeExit', callback) -- Doesn't get called as the event loop still contains the Redis connection
client.unref()
-- Closes connection if no commands are pending. Doesn't handle future calls.
client.on('idle', callback)
-- Works but is deprecated and may still miss future calls
What I'm currently doing
Once the batch queue is empty, I call:
intervalId = setInterval(closeRedis, 1000);
I close the Redis connection and clear the interval in the callback after a timeout:
function closeRedis() {
redis.client('list', (err, result) => {
var idle = parseClientList(result, 'idle');
if (idle > timeout) {
redis.quit();
clearInterval(intervalId);
}
});
}
This approach mostly works, but if just checking for a timeout, there is still a chance that other processes are going on and a Redis call may be made in the future. I'd like to close the connection when there's only an idle connection remaining in the event loop. Is there a way to do this?

I ended up using process._getActiveHandles(). Once the batch queue is empty, I set an interval to check every half a second if only the minimum processes remain. If so I unref the redisClient.
redisIntervalId = setInterval(closeRedis, 500);
// close redis client connection if it's the last required process
function closeRedis() {
// 2 core processes plus Redis and interval
var minimumProcesses = 4;
if (process._getActiveHandles().length > minimumProcesses)
return;
clearInterval(redisIntervalId);
redisClient.unref();
}
The advantage of this approach is that I can be sure Redis client will not close the connection while other important processes are running. I can also be sure that the client won't keep the event loop alive after all the important processes have been completed.
The downside is _getActiveHandles() is an undocumented node function so it may get changed or removed later. Also, unref() is experimental and doesn't consider some Redis commands when closing the connection.

Async callback blocks NodeJS

I have a server-client based NODE.JS application.
server.js
...
socket.on('message', function(message) {
if(message.code == 103)
{
process_some_data()
}
else
{
console.log("UNKNOWN MESSAGE");
}
});
...
client.js
.. sending responses back to server.js
the process_some_data() function takes about 4 seconds to complete, and when i have just one client it is not a problem, but if i have 10, they all choke and wait till the the last finishes.
I found out that the entire socket event waits till he finishes the current job, for example if i comment process_some_data(), it will not be frozen
I have tried 2 tweaks but the didn't worked :
...
socket.on('message', function(message) {
if(message.code == 103)
{
setTimeout(function() {
process_some_data();
console.log("FINISH");
}, 1)
}
else
{
console.log("UNKNOWN MESSAGE");
}
});
...
And even used http://caolan.github.io/async/ ,but no use :
...
socket.on('message', function(message) {
if(message.code == 103)
{
// Array to hold async tasks
var asyncTasks = [];
async.series([
setTimeout(function() {
process_some_data();
console.log("FINISH");
}, 1)
], function (err, results) {
console.log(results);
});
}
else
{
console.log("UNKNOWN MESSAGE");
}
});
...
How to make this ASYNC? Really need this.
Thank you.

You need multiple processes to solve this with Javascript, because Javascript engines are single-threaded.
What?
When it comes to handling I/O events, such as reading a socket, writing to a file or waiting for a signal, Javascript engines give the appearance of doing multiple things at the same time.
They are actually not: it's just that, under most conditions, processing these events takes so little computation, and the events themselves occur with so much time in between (a microsecond is an eternity for a CPU), that the engine can just process them one after another with plenty of time to spare.
In human time-scale, it looks like the engine is doing a lot of stuff in parallel, but it's just working serially at great speed.
No matter how you schedule your code to run, using setTimeout or Promise, it will still block other events from being processed during the time it's actively computing. Long-running computations (in the scale of seconds, instead of milliseconds) expose the single-threaded nature of the engine: it cannot actually do multiple things at the same time.
Multiple processes
Your computer, however, probably has multiple CPU cores. Unlike the Javascript engine, your hardware is capable of tackling multiple tasks at the same time, at least 1 per core. Even with a single core, your operating system can solve the problem if you run multiple processes.
Since a single Javascript process is single-threaded, you need multiple Javascript processes for this. An easy and time-proven architecture to solve your problem is this:
One Javascript program, running in one process, reads from the socket. Instead of calling process_some_data(), however, it puts all incoming messages in a queue.
This program then sends items from the queue to another Javascript program, running in a different process, that performs the computation using another CPU core. There may be multiple copies of this second process. In a modern computer, it makes sense to have twice as many active processes as you have CPU cores.
A simple approach for Node is to write an HTTP server, using express, that runs the computationally-intensive task. The main program can then use HTTP to delegate tasks to the workers, while still being able to read from the socket.
This is a good article on the topic of multi-processing with Node, using the cluster API.

Parallel Request at different paths in NodeJS: long running path 1 is blocking other paths

I am trying out simple NodeJS app so that I could to understand the async nature.
But my problem is as soon as I hit "/home" from browser it waits for response and simultaneously when "/" is hit, it waits for the "/home" 's response first and then responds to "/" request.
My concern is that if one of the request needs heavy processing, in parallel we can't request another one? Is this correct?
app.get("/", function(request, response) {
console.log("/ invoked");
response.writeHead(200, {'Content-Type' : 'text/plain'});
response.write('Logged in! Welcome!');
response.end();
});
app.get("/home", function(request, response) {
console.log("/home invoked");
var obj = {
"fname" : "Dead",
"lname" : "Pool"
}
for (var i = 0; i < 999999999; i++) {
for (var i = 0; i < 2; i++) {
// BS
};
};
response.writeHead(200, {'Content-Type' : 'application/json'});
response.write(JSON.stringify(obj));
response.end();
});

Good question,
Now, although Node.js has it's asynchronous nature, this piece of code:
for (var i = 0; i < 999999999; i++) {
for (var i = 0; i < 2; i++) {
// BS
};
};
Is not asynchronous actually blocking the node main thread. And therefore, all other requests has to wait until this big for loop will end.
In order to do some heavy calculations in parallel I recommend using setTimeout or setInterval to achieve your goal:
var i=0;
var interval = setInterval(function() {
if(i++>=999999999){
clearInterval(interval);
}
//do stuff here
},5);
For more information I recommend searching for "Node.js event loop"

As Stasel, stated, code running like will block the event loop. Basically whenever javascript is running on the server, nothing else is running. Asynchronous I/O events such as disk I/O might be processing in the background, but their handler/callback won't be call unless your synchronous code has finished running. Basically as soon as it's finished, node will check for pending events to be handled and call their handlers respectively.
You actually have couple of choices to fix this problem.
Break the work in pieces and let the pending events be executed in between. This is almost same as Stasel's recommendation, except 5ms between a single iteration is huge. For something like 999999999 items, that takes forever. Firstly I suggest batch process the loop for about sometime, then schedule next batch process with setimmediate. setimmediate basically will schedule it after the pending I/O events are handled, so if there is not new I/O event to be handled(like no new http requests) then it will executed immediately. It's fast enough. Now the question comes that how much processing should we do for each batch/iteration. I suggest first measure how much does it on average manually, and for schedule about 50ms of work. For example if you have realized 1000 items take 100ms. Then let it process 500 items, so it will be 50ms. You can break it down further, but the more broken down, the more time it takes in total. So be careful. Also since you are processing huge amount of items, try not to make too much garbage, so the garbage collector won't block it much. In this not-so-similar question, I've explained how to insert 10000 documents into MongoDB without blocking the event loop.
Use threads. There are actually a couple nice thread implementations that you won't shoot yourself in foot with them. This is really a good idea for this case, if you are looking for performance for huge processings, since it would be tricky as I said above to implement CPU bound task playing nice with other stuff happening in the same process, asynchronous events are perfect for data-bound task not CPU bound tasks. There's nodejs-threads-a-gogo module you can use. You can also use node-webworker-threads which is built on threads-a-gogo, but with webworker API. There's also nPool, which is a bit more nice looking but less popular. They all support thread pools and should be straight forward to implement a work queue.
Make several processes instead of threads. This might be slower than threads, but for huge stuff still way better than iterating in the main process. There's are different ways. Using processes will bring you a design that you can extend it to using multiple machines instead of just using multiple CPUs. You can either use a job-queue(basically pull the next from the queue whenever finished a task to process), a multi process map-reduce or AWS elastic map reduce, or using nodejs cluster module. Using cluster module you can listen to unix domain socket on each worker and for each job just make a request to that socket. Whenever the worker finished processing the job, it will just write back to that particular request. You can search about this stuff, there are many implementations and modules existing already. You can use 0MQ, rabbitMQ, node built-in ipc, unix domain sockets or a redis queue for multi process communications.

How to have heavy processing operations done in node.js

I have a heavy data processing operation that I need to get done per 10-12 simulatenous request. I have read that for higher level of concurrency Node.js is a good platform and it achieves it by having an non blocking event loop.
What I know is that for having things like querying a database, I can spawn off an event to a separate process (like mongod, mysqld) and then have a callback which will handle the result from that process. Fair enough.
But what if I want to have a heavy piece of computation to be done within a callback. Won't it block other request until the code in that callback is executed completely. For example I want to process an high resolution image and code I have is in Javascript itself (no separate process to do image processing).
The way I think of implementing is like
get_image_from_db(image_id, callback(imageBitMap) {
heavy_operation(imageBitMap); // Can take 5 seconds.
});
Will that heavy_operation stop node from taking in any request for those 5 seconds. Or am I thinking the wrong way to do such task. Please guide, I am JS newbie.
UPDATE
Or can it be like I could process partial image and make the event loop go back to take in other callbacks and return to processing that partial image. (something like prioritising events).

Yes it will block it, as the callback functions are executed in the main loop. It is only the asynchronously called functions which do not block the loop. It is my understanding that if you want the image processing to execute asynchronously, you will have to use a separate processes to do it.
Note that you can write your own asynchronous process to handle it. To start you could read the answers to How to write asynchronous functions for Node.js.
UPDATE
how do i create a non-blocking asynchronous function in node.js? may also be worth reading. This question is actually referenced in the other one I linked, but I thought I'd include it here to for simplicity.

Unfortunately, I don't yet have enough reputation points to comment on Nick's answer, but have you looked into Node's cluster API? It's currently still experimental, but it would allow you to spawn multiple threads.

When a heavy piece of computation is done in the callback, the event loop would be blocked until the computation is done. That means the callback will block the event loop for the 5 seconds.
My solution
It's possible to use a generator function to yield back control to the event loop. I will use a while loop that will run for 3 seconds to act as a long running callback.
Without a Generator function
let start = Date.now();
setInterval(() => console.log('resumed'), 500);
function loop() {
while ((Date.now() - start) < 3000) { //while the difference between Date.now() and start is less than 3 seconds
console.log('blocked')
}
}
loop();
The output would be:
// blocked
// blocked
//
// ... would not return to the event loop while the loop is running
//
// blocked
//...when the loop is over then the setInterval kicks in
// resumed
// resumed
With a Generator function
let gen;
let start = Date.now();
setInterval(() => console.log('resumed'), 500);
function *loop() {
while ((Date.now() - start) < 3000) { //while the difference between Date.now() and start is less than 3 seconds
console.log(yield output())
}
}
function output() {
setTimeout(() => gen.next('blocked'), 500)
}
gen = loop();
gen.next();
The output is:
// resumed
// blocked
//...returns control back to the event loop while though the loop is still running
// resumed
// blocked
//...end of the loop
// resumed
// resumed
// resumed
Using javascript generators can help run heavy computational functions that would yield back control to the event loop while it's still computing.
To know more about the event loop visit
https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Statements/function*
https://davidwalsh.name/es6-generators

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string