How to set a timeout on a sync function in nodejs - node.js

I want to protect against users who create Liquid templates in our system that could cause a lot of processing (eg, infinite loop or very large loops). I'm using LiquidJS.(https://github.com/harttle/liquidjs).
I currently have perl code that uses Template::Liquid and Sys::SigAction::timeout_call to accomplish calling the Liquid render function with a 100 ms timeout as follows:
use Template::Liquid;
use Sys::SigAction qw( timeout_call );
my $retval = "";
$data = {
name => 'foo',
title => 'bar'
};
my $template = 'Hi, {{name | upcase}} {{title}}!';
if ( timeout_call( 0.1 ,sub { $retval = Template::Liquid->parse($template)->render(%$data); } ) )
{
print "Liquid template timed out\n" ;
}
print "retval=$retval";
Is there a NodeJS module that would help me accomplish the same thing in a similar code control flow?

The short answer is you can't easily put a timeout on a synchronous function call in node.js. Because your Javascript is executed in a single-threaded event-driven system, while a synchronous call is running, no other events can get processed, including timers.
If you really wanted to protected yourself from this synchronous call, you would have to move it out of the main thread, either by putting it into a child process or into a WorkerThread. You could then have a timer in the main thread that, if it doesn't get a response before your timeout, you can kill the child or worker.
Now, if you control the code inside the synchronous call, you can code in your own protections within the synchronous call. It could not the time at start of execution and multiple places during processing, it could check how much time has elapsed and then abort if too much time has passed. But, this would have to be done from inside the code for the synchronous call, not from the outside and would typically involve checking the execution time during a loop or in some code that is regularly called as part of the operation.

Related

Single thread synchronous and asynchronous confusion

Assume makeBurger() will take 10 seconds
In synchronous program,
function serveBurger() {
makeBurger();
makeBurger();
console.log("READY") // Assume takes 5 seconds to log.
}
This will take a total of 25 seconds to execute.
So for NodeJs lets say we make an async version of makeBurgerAsync() which also takes 10 seconds.
function serveBurger() {
makeBurgerAsync(function(count) {
});
makeBurgerAsync(function(count) {
});
console.log("READY") // Assume takes 5 seconds to log.
}
Since it is a single thread. I have troubling imagine what is really going on behind the scene.
So for sure when the function run, both async functions will enter event loops and console.log("READY") will get executed straight away.
But while console.log("READY") is executing, no work is really done for both async function right? Since single thread is hogging console.log for 5 seconds.
After console.log is done. CPU will have time to switch between both async so that it can run a bit of each function each time.
So according to this, the function doesn't necessarily result in faster execution, async is probably slower due to switching between event loop? I imagine that, at the end of the day, everything will be spread on a single thread which will be the same thing as synchronous version?
I am probably missing some very big concept so please let me know. Thanks.
EDIT
It makes sense if the asynchronous operations are like query DB etc. Basically nodejs will just say "Hey DB handle this for me while I'll do something else". However, the case I am not understanding is the self-defined callback function within nodejs itself.
EDIT2
function makeBurger() {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
return count;
}
function makeBurgerAsync(callback) {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
callback(count);
}
In node.js, all asynchronous operations accomplish their tasks outside of the node.js Javascript single thread. They either use a native code thread (such as disk I/O in node.js) or they don't use a thread at all (such as event driven networking or timers).
You can't take a synchronous operation written entirely in node.js Javascript and magically make it asynchronous. An asynchronous operation is asynchronous because it calls some function that is implemented in native code and written in a way to actually be asynchronous. So, to make something asynchronous, it has to be specifically written to use lower level operations that are themselves asynchronous with an asynchronous native code implementation.
These out-of-band operations, then communicate with the main node.js Javascript thread via the event queue. When one of these asynchronous operations completes, it adds an event to the Javascript event queue and then when the single node.js thread finishes what it is currently doing, it grabs the next event from the event queue and calls the callback associated with that event.
Thus, you can have multiple asynchronous operations running in parallel. And running 3 operations in parallel will usually have a shorter end-to-end running time than running those same 3 operations in sequence.
Let's examine a real-world async situation rather than your pseudo-code:
function doSomething() {
fs.readFile(fname, function(err, data) {
console.log("file read");
});
setTimeout(function() {
console.log("timer fired");
}, 100);
http.get(someUrl, function(err, response, body) {
console.log("http get finished");
});
console.log("READY");
}
doSomething();
console.log("AFTER");
Here's what happens step-by-step:
fs.readFile() is initiated. Since node.js implements file I/O using a thread pool, this operation is passed off to a thread in node.js and it will run there in a separate thread.
Without waiting for fs.readFile() to finish, setTimeout() is called. This uses a timer sub-system in libuv (the cross platform library that node.js is built on). This is also non-blocking so the timer is registered and then execution continues.
http.get() is called. This will send the desired http request and then immediately return to further execution.
console.log("READY") will run.
The three asynchronous operations will complete in an indeterminate order (whichever one completes it's operation first will be done first). For purposes of this discussion, let's say the setTimeout() finishes first. When it finishes, some internals in node.js will insert an event in the event queue with the timer event and the registered callback. When the node.js main JS thread is done executing any other JS, it will grab the next event from the event queue and call the callback associated with it.
For purposes of this description, let's say that while that timer callback is executing, the fs.readFile() operation finishes. Using it's own thread, it will insert an event in the node.js event queue.
Now the setTimeout() callback finishes. At that point, the JS interpreter checks to see if there are any other events in the event queue. The fs.readfile() event is in the queue so it grabs that and calls the callback associated with that. That callback executes and finishes.
Some time later, the http.get() operation finishes. Internal to node.js, an event is added to the event queue. Since there is nothing else in the event queue and the JS interpreter is not currently executing, that event can immediately be serviced and the callback for the http.get() can get called.
Per the above sequence of events, you would see this in the console:
READY
AFTER
timer fired
file read
http get finished
Keep in mind that the order of the last three lines here is indeterminate (it's just based on unpredictable execution speed) so that precise order here is just an example. If you needed those to be executed in a specific order or needed to know when all three were done, then you would have to add additional code in order to track that.
Since it appears you are trying to make code run faster by making something asynchronous that isn't currently asynchronous, let me repeat. You can't take a synchronous operation written entirely in Javascript and "make it asynchronous". You'd have to rewrite it from scratch to use fundamentally different asynchronous lower level operations or you'd have to pass it off to some other process to execute and then get notified when it was done (using worker processes or external processes or native code plugins or something like that).

How do I make function to running on background?

I have this code periodically calls the load function which does very load work taking 10sec. Problem is when load function is being executed, it's blocking the main flow. If I send a simple GET request (like a health check) while load is being executed, the GET call is blocked until the load call is finished.
function setLoadInterval() {
var self = this;
this.interval = setInterval(function doHeavyWork() {
// this takes 10 sec
self.load();
self.emit('reloaded');
}, 20000);
I tried async.parallel but still the GET call was blocked. I tried setTimeout but got the same result. How do I make load to running on background so that it doesn't block the main flow?
this.interval = setInterval(function doHeavyWork() {
async.parallel([function(cb) {
self.load();
cb(null);
}], function(err) {
if (err) {
// log error
}
self.emit('reloaded');
})
}, 20000);
Node.js is an event driven non-blocking IO model
Anything that is IO is offloaded as a separate thread in the underlying engine and hence parallelism is achieved.
If the task is CPU intensive there is no way you can achieve parallelism as by default Javascript is a blocking sync language
However there are some ways to achieve this by offloading the CPU intensive task to a different process.
Option1:
exec or spawn a child process and execute the load() function in that spawned node app. This is okay if the interval fired is for every 20000 ms as by the time another one fired, the 10sec process will be completed.
Otherwise it is dangerous as it can spawn too many node applications eating up your Systems resources
Option2:
I dont know how much data self.load() accepts and returns. If it is trivial and network overhead is acceptable, make that task a load-balanced web service (may be 4 webservers running in parallel) which accepts (rather point to) 1M records and returns back filtered records.
NOTE
It looks like you are using node async parallel function. But keep a note of this description from the documentation.
Note: parallel is about kicking-off I/O tasks in parallel, not about parallel execution of code. If your tasks do not use any timers or perform any I/O, they will actually be executed in series. Any synchronous setup sections for each task will happen one after the other. JavaScript remains single-threaded.

How to have heavy processing operations done in node.js

I have a heavy data processing operation that I need to get done per 10-12 simulatenous request. I have read that for higher level of concurrency Node.js is a good platform and it achieves it by having an non blocking event loop.
What I know is that for having things like querying a database, I can spawn off an event to a separate process (like mongod, mysqld) and then have a callback which will handle the result from that process. Fair enough.
But what if I want to have a heavy piece of computation to be done within a callback. Won't it block other request until the code in that callback is executed completely. For example I want to process an high resolution image and code I have is in Javascript itself (no separate process to do image processing).
The way I think of implementing is like
get_image_from_db(image_id, callback(imageBitMap) {
heavy_operation(imageBitMap); // Can take 5 seconds.
});
Will that heavy_operation stop node from taking in any request for those 5 seconds. Or am I thinking the wrong way to do such task. Please guide, I am JS newbie.
UPDATE
Or can it be like I could process partial image and make the event loop go back to take in other callbacks and return to processing that partial image. (something like prioritising events).
Yes it will block it, as the callback functions are executed in the main loop. It is only the asynchronously called functions which do not block the loop. It is my understanding that if you want the image processing to execute asynchronously, you will have to use a separate processes to do it.
Note that you can write your own asynchronous process to handle it. To start you could read the answers to How to write asynchronous functions for Node.js.
UPDATE
how do i create a non-blocking asynchronous function in node.js? may also be worth reading. This question is actually referenced in the other one I linked, but I thought I'd include it here to for simplicity.
Unfortunately, I don't yet have enough reputation points to comment on Nick's answer, but have you looked into Node's cluster API? It's currently still experimental, but it would allow you to spawn multiple threads.
When a heavy piece of computation is done in the callback, the event loop would be blocked until the computation is done. That means the callback will block the event loop for the 5 seconds.
My solution
It's possible to use a generator function to yield back control to the event loop. I will use a while loop that will run for 3 seconds to act as a long running callback.
Without a Generator function
let start = Date.now();
setInterval(() => console.log('resumed'), 500);
function loop() {
while ((Date.now() - start) < 3000) { //while the difference between Date.now() and start is less than 3 seconds
console.log('blocked')
}
}
loop();
The output would be:
// blocked
// blocked
//
// ... would not return to the event loop while the loop is running
//
// blocked
//...when the loop is over then the setInterval kicks in
// resumed
// resumed
With a Generator function
let gen;
let start = Date.now();
setInterval(() => console.log('resumed'), 500);
function *loop() {
while ((Date.now() - start) < 3000) { //while the difference between Date.now() and start is less than 3 seconds
console.log(yield output())
}
}
function output() {
setTimeout(() => gen.next('blocked'), 500)
}
gen = loop();
gen.next();
The output is:
// resumed
// blocked
//...returns control back to the event loop while though the loop is still running
// resumed
// blocked
//...end of the loop
// resumed
// resumed
// resumed
Using javascript generators can help run heavy computational functions that would yield back control to the event loop while it's still computing.
To know more about the event loop visit
https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Statements/function*
https://davidwalsh.name/es6-generators

Perl parallel HTTP requests - out of memory

First of all, I'm new to Perl.
I want to make multiple (e.g. 160) HTTP GET requests on a REST API in Perl. Executing them one after another takes much time, so I was thinking of running the requests in parallel. Therefore I used threads to execute more requests at the same time and limited the number of parallel requests to 10.
This worked just fine for the first time I ran the program, the second time I ran 'out of memory' after the 40th request.
Here's the code: (#urls contains the 160 URLs for the requests)
while(#urls) {
my #threads;
for (my $j = 0; $j < 10 and #urls; $j++) {
my $url = shift(#urls);
push #threads, async { $ua->get($url) };
}
for my $thread (#threads) {
my $response = $thread->join;
print "$response\n";
}
}
So my question is, why am I NOT running out of memory the first time but the second time (am I missing something crucial in my code)? And what can I do to prevent it?
Or is there a better way of executing parallel GET requests?
I'm not sure why you would get a OOM error on a second run when you don't get one on the first run; when you run a Perl script and the perl binary exits, it'll release all of it's memory back to the OS. Nothing is kept between executions. Is the exact same data being returned by the REST service each time? Maybe there's more data the second time you run and it's pushing you over the edge.
One problem I notice is that you're launching 10 threads and running them to completion, then spawning 10 more threads. A better solution may be a worker-thread model. Spawn 10 threads (or however many you want) at the start of the program, put the URLs into a queue, and allow the threads to process the queue themselves. Here's a quick example that may help:
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new();
my #thr = map {
threads->create(sub {
my #responses = ();
while (defined (my $url = $q->dequeue())) {
push #responses, $ua->get($url);
}
return #responses;
});
} 1..10;
$q->enqueue($_) for #urls;
$q->enqueue(undef) for 1..10;
foreach (#thr) {
my #responses_of_this_thread = $_->join();
print for #responses_of_this_thread;
}
Note, I haven't tested this to make sure it works. In this example, you create a new thread queue and spawn up 10 worker threads. Each thread will block on the dequeue method until there is something to be read. Next, you queue up all the URLs that you have, and an undef for each thread. The undef will allow the threads to exit when there is no more work to perform. At this point, the threads will go through and process the work, and you will gather the responses via the join at the end.
Whenever I need an asynchronous solution Perl, I first look at the POE framework. In this particular case I used POE HTTP Request module that will allow us to send multiple requests simultaneously and provide a callback mechanism where you can process your http responses.
Perl threads are scary and can crash your application, especially when you join or detach them. If responses do not take a long time to process, a single threaded POE solution would work beautifully.
Sometimes though, we have to a rely on threading because application gets blocked due to long running tasks. In those cases, I create a certain number of threads BEFORE initiating anything in the application. Then with Thread::Queue I pass the data from the main thread to these workers AND never join/detach them; always keep them around for stability purposes.
(Not an ideal solution for every case.)
POE supports threads now and each thread can run a POE::Kernel. The kernels can communicate with each other through TCP sockets (which POE provides nice unblocking interfaces).

Locking on an object?

I'm very new to Node.js and I'm sure there's an easy answer to this, I just can't find it :(
I'm using the filesystem to hold 'packages' (folders with a status extensions 'mypackage.idle') Users can perform actions on these which would cause the status to go to something like 'qa', or 'deploying' etc... If the server is accepting lots of requests and multiple requests come in for the same package how would I check the status and then perform an action, which would change the status, guaranteeing that another request didn't alter it before/during the action took place?
so in c# something like this
lock (someLock) { checkStatus(); performAction(); }
Thanks :)
If checkStatus() and performAction() are synchronous functions called one after another, then as others mentioned earlier: their exectution will run uninterupted till completion.
However, I suspect that in reality both of these functions are asynchoronous, and the realistic case of composing them is something like:
function checkStatus(callback){
doSomeIOStuff(function(something){
callback(something == ok);
});
}
checkStatus(function(status){
if(status == true){
performAction();
}
});
The above code is subject to race conditions, as when doSomeIOStuff is being perfomed instead of waiting for it new request can be served.
You may want to check https://www.npmjs.com/package/rwlock library.
This is a bit misleading. There are many script languages that are suppose to be single threaded, but when sharing data from the same source this creates a problem. NodeJs might be single threaded when you are running a single request, but when you have multiple requests trying to access the same data, it just behaves as it creates kind of the same problem as if you were running a multithreaded language.
There is already an answer about this here : Locking on an object?
WATCH sentinel_key
GET value_of_interest
if (value_of_interest = FULL)
MULTI
SET sentinel_key = foo
EXEC
if (EXEC returned 1, i.e. succeeded)
do_something();
else
do_nothing();
else
UNWATCH
One thing you can do is lock on an external object, for instance, a sequence in a database such as Oracle or Redis.
http://redis.io/commands
For example, I am using cluster with node.js (I have 4 cores) and I have a node.js function and each time I run through it, I increment a variable. I basically need to lock on that variable so no two threads use the same value of that variable.
check this out How to create a distributed lock with Redis?
and this https://engineering.gosquared.com/distributed-locks-using-redis
I think you can run with this idea if you know what you are doing.
If you are making asynchronous calls with callbacks, this means multiple clients could potentially make the same, or related requests, and receive responses in different orders. This is definitely a case where locking is useful. You won't be 'locking a thread' in the traditional sense, but merely ensuring asynchronous calls, and their callbacks are made in a predictable order. The async-lock package looks like it handles this scenario.
https://www.npmjs.com/package/async-lock
warning, node.js change semantic if you add a log entry beucause logging is IO bound.
if you change from
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
qa_action_performed = true
perform_action()
}
}
to
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
console.log("my log stuff");
qa_action_performed = true
perform_action()
}
}
more than one thread can execute perform_action().
You don't have to worry about synchronization with Node.js since it's single threaded with an event loop. This is one of the advantage of the architecture that Node.js use.
Nothing will be executed between checkStatus() and performAction().
There are no locks in node.js -- because you shouldn't need them. There's only one thread (the event loop) and your code is never interrupted unless you perform an asynchronous action like I/O. Hence your code should never block. You can't do any parallel code execution.
That said, your code could look something like this:
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
qa_action_performed = true
perform_action()
}
}
Between check_status() and perform_action() no other thread can interrupt because there is no I/O. As soon as you enter the if clause and set qa_action_performed = true, no other code will enter the if block and hence perform_action() is never executed twice, even if perform_action() takes time performing I/O.

Resources