Node.js performance question for outgoing requests - node.js

i remember the default thread pool size for node is 4 (or based on cpu count). This brings my question like this.
For the very basic simplified case, i'm writing a service1 in node, which sends requests to service2, wait till it finishes the computation and then continue. Now service2 in another server can handle 1000 requests at the same time, it takes time, also it's a blocking call (which is out of my control).
If i do the java way, i can create 1000 threads from glassfish, so the 1st 1000 blast requests can be processed at the same time. The 1001th may need to wait a little bit.
1000 incoming req -> java server1 -> 1000 threads -> 1000 outgoing req -> server2
But in node, if the thread pool size is 4 given it's a 4 core CPU machine, that means node app will be slower than java in this case ? What happens if i increase the pool size to 1000 ? Can i increase to 1000 ?
1000 incoming req -> node server1 -> ~4 threads -> 1000 outgoing req -> server2
I don't see an easy for node, or i can let node handle most stuff, for the above blocking call, add a small java server and dispatch outing req to that ? Any suggestion ?
UPDATE: found this, We use setTimeout( function(){} , 0 ); to create asynchronous functions in JavaScript!
https://medium.com/from-the-scratch/javascript-writing-your-own-non-blocking-asynchronous-functions-60091ceacc79
Guess if i convert the block call into async function, it can solve my issue, i hope, praying !!!

Node hands it's I/O tasks off to the operating system to handle, which are generally multi-threaded. It takes the approach of not having to wait for requests to finish (by blocking a thread), because it wastes time sitting. So, Node hands these tasks off and just tells it to poke Node when it's done. There is a very good related question.
How, in general, does Node.js handle 10,000 concurrent requests?

Related

Akka HTTP and long running requests

We have an API implemented in bare bones Scala Akka HTTP - a couple of routes fronting for a heavy computation (CPU and memory intensive). No clustering - all running on one beefy machine. The computation is proper heavy - can take more than 60s to complete for one isolated request. And we don't care about the speed that much. There's no blocking IO, just lots of CPU processing.
When I started performance testing the thing, an interesting pattern showed: say requests A1, A2, ..., A10 come through. They use resources quite heavily and it turns out that Akka will return HTTP 503 for requests A5-A10 that overran. The problem is that that computation is still running even though there's no one there to pick up the result.
And from there we see a cascading performance collapse: requests A11-A20 arrive to a server still working on requests A5-A10. Clearly these new requests also have a chance of overrunning - even higher given that the server is busier. So some of them will be running by the time Akka triggered a timeout, making the server even busier and slower and then the new batch of requests comes through... so after running the system for a bit you see that nearly all requests after certain point start failing with timeouts. And after you stop the load you see in logs some requests still being worked on.
I've tried running the computation in a separate ExecutionContext as well as the system dispatcher, trying to make it fully asynchronous (via Future composition), but the result is still the same. Lingering jobs make server so busy that eventually almost every request fails.
A similar case is described in https://github.com/zcox/spray-blocking-test but the focus is shifted there - /ping doesn't matter for us as much as more or less stable responsibility on endpoint that handles long running requests.
The question: how do I design my application to be better at interrupting hanging requests? I can tolerate some small percentage of failed requests under heavy load, but grinding the entire system to a halt after several seconds is unacceptable.
Akka HTTP does not automatically terminate processing for requests which have timed out. Usually the extra bookkeeping which would be needed to do that would not pay off, so it's not on by default. I think it's something of an oversight, TBH, and I've had similar problems with Akka HTTP myself.
I think you need to manually abort the processing on request timeout, otherwise the server will not recover when it is overloaded, as you have seen.
There isn't a standard mechanism with which you can implement this (see "How to cancel Future in Scala?"). If the thread is doing CPU work with no i/o, then Thread.interrupt() will not be useful. Instead you should create a Deadline or Promise or similar that shows if the request is still open, and pass that around and periodically check for timeout during your computation:
// in the HTTP server class:
val responseTimeout: Duration = 30.seconds
val routes =
path("slowComputation") {
complete {
val responseTimeoutDeadline: Deadline = responseTimeout.fromNow
computeSlowResult(responseTimeoutDeadline)
}
}
// in the processing code:
def computeSlowResult(responseDeadline: Deadline): Future[HttpResponse] = Future {
val gatherInputs: List[_] = ???
gatherInputs.fold(0) { (acc, next) =>
// check if the response has timed out
if (responseDeadline.isOverdue())
throw new TimeoutException()
acc + next // proceed with the calculation a little
}
}
(Checking if a Promise has been completed will be a lot cheaper than checking whether a Deadline has expired. I've put the code for the latter above, as it's easier to write.)
The spray-blocking-test uses libraries that I don't think exist in Akka HTTP. I'd a similar problem and I solved it as follows:
application.conf
blocking-io-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
fixed-pool-size = 16
}
throughput = 1
}
Route
complete {
Try(new URL(url)) match {
case scala.util.Success(u) => {
val src = Source.fromIterator(() => parseMovies(u).iterator)
src
.via(findMovieByTitleAndYear)
.via(persistMovies)
.completionTimeout(5.seconds)
.toMat(Sink.fold(Future(0))((acc, elem) => Applicative[Future].map2(acc, elem)(_ + _)))(Keep.right)
// run the whole graph on a separate dispatcher
.withAttributes(ActorAttributes.dispatcher("blocking-io-dispatcher"))
.run.flatten
.onComplete {
_ match {
case scala.util.Success(n) => logger.info(s"Created $n movies")
case Failure(t) => logger.error(t, "Failed to process movies")
}
}
Accepted
}
case Failure(t) => logger.error(t, "Bad URL"); BadRequest -> "Bad URL"
}
}
The response returns immediately while the processing keeps happening in the background.
Additional reading:
http://doc.akka.io/docs/akka/current/scala/dispatchers.html
http://blog.akka.io/streams/2016/07/06/threading-and-concurrency-in-akka-streams-explained

Which response belongs to which task in a node.js threadPool?

Imagine you are going to have a lot of long processor intensive tasks of translating some strings into something else. You are going to want to have a pool of actual threads to keep the main node thread going and to make use of your cores.
The main way to do this is to either implement Threads-a-gogo or Webworker-Threads, and start a pool of 16 threads (e.g. on a Intel with 8 cores you usually have 16 threads concurrently).
Doing a request to a thread is called an event or a message. Getting a response is also catching an event or getting a message. But how does this work with a threadPool?
If you skip the Webworker API, TAGG and Webworkers for node have the same underlying API. You can load your translation function in all workers using threadPool.load and que a task to one of them using threadPool.any.
But imagine I now have 50 tasks (strings to translate) to be queued. The threadPool will eventually emit 50 events (responses with a translated string) without telling me what task the response belongs to?
I think I am fundamentally misunderstanding one thing about the threadPool.
Is there a way I can just add a task to the threadPool queue and receive a callback when that particular task is done?
Why emit events from the thread pool when you can just return the translated string? The value returned by the code is received by the callback you passed to threadpool.any.eval(). Example:
threadPool.any.eval('return "hello world"', function(err, data) {
// data === 'hello world'
});

node.js asynchronous logic behavior

I am building a CPU intensive web app, where ill write the CPU intensive stuff in C++ while ill write the webserver in node.js. The node.js would be connected to c++ via addons. I am confused about one thing -
Say the time of the CPU intensive operation per request is 5 seconds(maybe this involved inverting a huge matrix). When this request comes through, the node.js binding to c++ would send this request over to the c++ code.
Now does this mean that node.js would not be caught up for the next 5 seconds and can continue serving other requests?
I am confused as i have heard that even though node offers asynchronous features, it is still single threaded.
Obviously I would not want node.js to be stuck up for 5s as it is a huge price to pay. Imagine 100s of requests simultaneously for this intensive operation..
Trying to understand JS callbacks and asynchronicity logic, i came across with many different versions of the following description;
a callback function which is passed to another function as a parameter, runs
following to the time taking process of the function it's passed to.
The dilemma gets originated with the "time taking" adjective. Such as is it
Time taking because of CPU being idle and waiting for a response?
Time taking because of CPU being busy with number crunching like hell?
This is not clear in the description and confused me. So i tried the following two codes.
getData('http://fakedomain1234.com/userlist', writeData);
document.getElementById('output').innerHTML += "show this before data ...";
function getData(dataURI, callback) {
// Normally you would actually connect to a server here.
// We're just going to simulate a 3-second delay.
var timer = setTimeout(function () {
var dataArray = [123, 456, 789, 012, 345, 678];
callback(dataArray);
}, 3000);
}
function writeData(myData) {
document.getElementById('output').innerHTML += myData;
}
<body>
<p id="output"></p>
</body>
and
getData('http://fakedomain1234.com/userlist', writeData);
document.getElementById('output').innerHTML += "show this before data ...";
function getData(dataURI, callback) {
var dataArray = [123, 456, 789, 012, 345, 678];
for (i=0; i<1000000000; i++);
callback(dataArray);
}
function writeData(myData) {
document.getElementById('output').innerHTML += myData;
}
<body>
<p id="output"></p>
</body>
so in both codes there is a time taking activity in the getData function. In the first one the CPU is idle and in the second the CPU is busy. Clearly when CPU is busy the JS runtime is not asynchronous.
The main thread of Node is the JS event loop, so all logic interacting with JS is single threaded. This also includes any C++ logic triggered directly via JS.
Generally any long-running tasks should be split off into worker processes. For instance, in your case, you could have a worker process that would queue up calculations, emitting events back to the JS thread when they have completed.
So really, it's a question of how you go about your connected to c++ via addons code.
I'm not going to refer to the specifics of Node.js as I'm not that familiar with the internal architecture and the possibilities it allows (but I understand it supports multiple worker threads, each representing a different event loop)
In general, if you need to process 100 request/s that take 5 seconds solid CPU time, then there's nothing you can do, except ensuring that you have 500 processors available.
If 100 request/s is peak, while on average it will be much lower, then the solution is queueing, and you use the queue to absorb the blow.
Now things start to get interesting when it is not 5 seconds solid CPU time, but 0.1 CPU time and 4.9 waiting or anything in between. This is the case where asynchronous processing should be used to put all that waiting time to work.
Asynchronous in this case means that:
All your execution happens in an event loop.
You don't wait, no sleep, no blocking I/O, just execute or return to the event loop.
You split your task into non-blocking subtasks, interspeded with (async) events (e.g. with a response) that continue the execution.
You split your system into a number of event processing services, exchanging requests and responses through asynchronous events and collaborating to provide the overall functionality.
What to do if you have a subsystem you cannot turn into an asynchronous service under the principles above?
The answer is to wrap it with queues (to absorb the requests) + multiple threads (allowing execution of some threads hile other threads are waiting), providing the async events request/response interface expected by rest of the subsystems.
In all cases it is best to keep a bounded number of threads (instead of a per-request thread model) and always keep the total number of active/hot threads in the system below the number of processing resources.
Node.js is nice in that its input/output is inherently asynchronously and all the infrastructure is geared towards implementing the kind of things I described above.

Erlang Node to Node Messaging Throughput, Timeouts and guarantees

Now, suppose we are designing an application, consists of 2 Erlang Nodes. On Node A, will be very many processes, in the orders of thousands. These processes access resources on Node B by sending a message to a registered process on Node B. At Node B, lets say you have a process started by executing the following function:
start_server()->
register(zeemq_server,spawn(?MODULE,server,[])),ok.<br>
server()->
receive
{{CallerPid, Ref}, {Module, Func, Args}} ->
Result = (catch erlang:apply(Module, Func, Args)),
CallerPid ! {Ref, Result},
server();
_ -> server()
end.
On Node A, any process that wants to execute any function in a given module on Node B, uses the following piece of code:
call(Node, Module, Func, Args)->
Ref = make_ref(),
Me = self(),
{zeemq_server,Node} ! {{Me, Ref}, {Module, Func, Args}},
receive
{Ref, Result} -> Result
after timer:minutes(3) ->
error_logger:error_report(["Call to server took so long"]),
{error,remote_call_failed}
end.
So assuming that Process zeemq_server on Node B, will never be down, and that the network connection between Node A and B is always up, please answer the following questions:
Qn 1: Since there is only one receiving process on Node B, its mail box is most likely to be full , all the time. This is because, the processes are many on Node A and at a given interval, say, 2 seconds, every process at least ,makes a single call to the Node B server. In which ways, can the reception be made redundant on the Node B ? , e.g. Process Groups e.t.c. and explain (the concept) how this would replace the server side code above. Show what changes would happen on the Client side.
Qn 2: In a situation where there is only one receiver on Node B, is there a maximum number of messages allowable in the process mail box ? how would erlang respond , if a single process mail ox is flooded with too many messages ?
Qn 3: In what ways, using the very concept showed above, can i guarantee that every process which sends a request , gets back an answer as soon as possible before the timeout occurs ? Could converting the reception part on the Node B to a parallel operation help ? like this:
start_server()->
register(zeemq_server,spawn(?MODULE,server,[])),ok.<br>
server()->
receive
{{CallerPid, Ref}, {Module, Func, Args}} ->
<b>spawn(?MODULE,child,[Ref,CallerPid,{Module, Func, Args}]),</b>
server();
_ -> server()
end.
child(Ref,CallerPid,{Module, Func, Args})->
Result = (catch erlang:apply(Module, Func, Args)),
CallerPid ! {Ref, Result},
ok.
The method showed above, may increase the instantaneous number of processes running on the Node B, and this may affect the service greatly due to memory. However, it looks good and makes the server() loop to return immediately to handle the next request. What is your take on this modification ?
Lastly : Illustrate how you would implement a Pool of receiver Threads on Node B, yet appearing to be under one Name as regards Node A. Such that, incoming messages are multiplexed amongst the receiver threads and the load shared within this group of processes. Keep the meaning of the problem the same.
The maximum number of messages in a process mailbox is unbounded, except by the amount of memory.
Also, if you need to inspect the mailbox size, use
erlang:process_info(self(),[message_queue_len,messages]).
This will return something like:
[{message_queue_len,0},{messages,[]}]
What I suggest is that you first convert your server above into a gen_server. This your worker.
Next, I suggest using poolboy ( https://github.com/devinus/poolboy ) to create a pool of instances of your server as poolboy workers (there are examples in their github Readme.md). Lastly, I suggest creating a module for callers with a helper method that creates a poolboy transaction and applies a Worker arg from the pool to a function. Example below cribbed from their github:
squery(PoolName, Sql) ->
poolboy:transaction(PoolName, fun(Worker) ->
gen_server:call(Worker, {squery, Sql})
end).
That said, would Erlang RPC suit your needs better? Details on Erlang RPC at http://www.erlang.org/doc/man/rpc.html. A good treatment of Erlang RPC is found at http://learnyousomeerlang.com/distribunomicon#rpc.
IMO spawning a new process to handle each request may be overkill, but it's hard to say without knowing what has to be done with each request.
You can have a pool of process to handle each msg, using a round robin method to distribute the requests or based on type of request ether handle it, send it to a child process or spawn a process. You can also monitor the load of the pooled processes by looking at their msg queues and starting new children if they are overloaded. Using a supervisor.. just use a send_after in the init to monitor the load every few seconds and act accordingly. Use OTP if you can, there's overhead but it is worth it.
I wouldn't use http for a dedicated line communication, I believe it's too much overhead. You can control the load using a pool of processes to handle it.

What happens when a single request takes a long time with these non-blocking I/O servers?

With Node.js, or eventlet or any other non-blocking server, what happens when a given request takes long, does it then block all other requests?
Example, a request comes in, and takes 200ms to compute, this will block other requests since e.g. nodejs uses a single thread.
Meaning your 15K per second will go down substantially because of the actual time it takes to compute the response for a given request.
But this just seems wrong to me, so I'm asking what really happens as I can't imagine that is how things work.
Whether or not it "blocks" is dependent on your definition of "block". Typically block means that your CPU is essentially idle, but the current thread isn't able to do anything with it because it is waiting for I/O or the like. That sort of thing doesn't tend to happen in node.js unless you use the non-recommended synchronous I/O functions. Instead, functions return quickly, and when the I/O task they started complete, your callback gets called and you take it from there. In the interim, other requests can be processed.
If you are doing something computation-heavy in node, nothing else is going to be able to use the CPU until it is done, but for a very different reason: the CPU is actually busy. Typically this is not what people mean when they say "blocking", instead, it's just a long computation.
200ms is a long time for something to take if it doesn't involve I/O and is purely doing computation. That's probably not the sort of thing you should be doing in node, to be honest. A solution more in the spirit of node would be to have that sort of number crunching happen in another (non-javascript) program that is called by node, and that calls your callback when complete. Assuming you have a multi-core machine (or the other program is running on a different machine), node can continue to respond to requests while the other program crunches away.
There are cases where a cluster (as others have mentioned) might help, but I doubt yours is really one of those. Clusters really are made for when you have lots and lots of little requests that together are more than a single core of the CPU can handle, not for the case where you have single requests that take hundreds of milliseconds each.
Everything in node.js runs in parallel internally. However, your own code runs strictly serially. If you sleep for a second in node.js, the server sleeps for a second. It's not suitable for requests that require a lot of computation. I/O is parallel, and your code does I/O through callbacks (so your code is not running while waiting for the I/O).
On most modern platforms, node.js does us threads for I/O. It uses libev, which uses threads where that works best on the platform.
You are exactly correct. Nodejs developers must be aware of that or their applications will be completely non-performant, if long running code is not asynchronous.
Everything that is going to take a 'long time' needs to be done asynchronously.
This is basically true, at least if you don't use the new cluster feature that balances incoming connections between multiple, automatically spawned workers. However, if you do use it, most other requests will still complete quickly.
Edit: Workers are processes.
You can think of the event loop as 10 people waiting in line to pay their bills. If somebody is taking too much time to pay his bill (thus blocking the event loop), the other people will just have to hang around waiting for their turn to come.. and waiting...
In other words:
Since the event loop is running on a single thread, it is very
important that we do not block it’s execution by doing heavy
computations in callback functions or synchronous I/O. Going over a
large collection of values/objects or performing time-consuming
computations in a callback function prevents the event loop from
further processing other events in the queue.
Here is some code to actually see the blocking / non-blocking in action:
With this example (long CPU-computing task, non I/O):
var net = require('net');
handler = function(req, res) {
console.log('hello');
for (i = 0; i < 10000000000; i++) { a = i + 5; }
}
net.createServer(handler).listen(80);
if you do 2 requests in the browser, only a single hello will be displayed in the server console, meaning that the second request cannot be processed because the first one blocks the Node.js thread.
If we do an I/O task instead (write 2 GB of data on disk, it took a few seconds during my test, even on a SSD):
http = require('http');
fs = require('fs');
buffer = Buffer.alloc(2*1000*1000*1000);
first = true;
done = false;
write = function() {
fs.writeFile('big.bin', buffer, function() { done = true; });
}
handler = function(req, res) {
if (first) {
first = false;
res.end('Starting write..')
write();
return;
}
if (done) {
res.end("write done.");
} else {
res.end('writing ongoing.');
}
}
http.createServer(handler).listen(80);
here we can see that the a-few-second-long-IO-writing-task write is non-blocking: if you do other requests in the meantime, you will see writing ongoing.! This confirms the well-known non-blocking-for-IO features of Node.js.

Resources