Putting a Load on Node

Putting a Load on Node - node.js

We have a C# Web API server and a Node Express server. We make hundreds of requests from the C# server to a route on the Node server. The route on the Node server does intensive work and often doesn't return for 6-8 seconds.
Making hundreds of these requests simultaneously seems to cause the Node server to fail. Errors in the Node server output include either socket hang up or ECONNRESET. The error from the C# side says
No connection could be made because the target machine actively refused it.
This error occurs after processing an unpredictable number of the requests, which leads me to think it is simply overloading the server. Using a Thread.Sleep(500) on the C# side allows us to get through more requests, and fiddling with the wait there leads to more or less success, but thread sleeping is rarely if ever the right answer, and I think this case is no exception.
Are we simply putting too much stress on the Node server? Can this only be solved with Load Balancing or some form of clustering? If there is an another alternative, what might it look like?
One path I'm starting to explore is the node-toobusy module. If I return a 503 though, what should be the process in the following code? Should I Thread.Sleep and then re-submit the request?

It sounds like your node.js server is getting overloaded.
The route on the Node server does intensive work and often doesn't return for 6-8 seconds.
This is a bad smell - if your node process is doing intense computation, it will halt the event loop until that computation is completed, and won't be able to handle any other requests. You should probably have it doing that computation in a worker process, which will run on another cpu core if available. cluster is the node builtin module that lets you do that, so I'll point you there.
One path I'm starting to explore is the node-toobusy module. If I return a 503 though, what should be the process in the following code? Should I Thread.Sleep and then re-submit the request?
That depends on your application and your expected load. You may want to refresh once or twice if it's likely that things will cool down enough during that time, but for your API you probably just want to return a 503 in C# too - better to let the client know the server's too busy and let them make their own decision then to keep refreshing on its behalf.

Related

Building Websites only on NodeJs and Express blocking requests over http

I have a question regarding the examples out there when using Nodejs, Express and Jade for templates.
All the examples show how to build some sort of a user administrative interface where you can add user profiles, delete them and manage them.
Those are considered beginner's guides to NodeJs. My question is around the fact that if I have have 10 users concurrently accessing the same interface and doing the same operations, surely NodeJs will block the requests for the other users as they are running on the same port.
So let's say I am pulling out a list of users which may be something like 10000. Yes I can do paging, but that is not the point. While I am getting the list from the server another 4 users want to access the application. They have to wait for my process to end. That is my question - how can one avoid that using NodeJS & Express?
I am on this issue for a couple of months! I currently have something in place that does the following:
Run the main processing of stuff on a port
Run a Socket.io process on a different port
Use a sticky session
The idea is that I do a request (like getting a list of items), and immediately respond with some request reference but without the requested items, thus releasing the port.
In the background "asynchronously" I then do the process of getting the items. Upon which when completed, I do an http request from one node to the socket node port node SENDING the items through.
When that is done I then perform a socket.io emit WITH the data and the initial request reference so that the correct user gets the message.
On the client side I have an event listening for the socket which then completes the ajax request by populating the list.
I have SOME success in doing this! It actually works to a degree! I have an issue online which complicates matters due to ip addresses, and socket.io playing funny.
I also have multiple workers using clustering. I use it in the following manner:
I create a master worker
I spawn workers
I take any connection request and pass it to the relevant worker.
I do that for the main node request as well as for the socket requests. Like I said I use 2 ports!
As you can see I have had a lot of work done on this and I am not getting a proper solution!
My question is this - have I gone all around the world 10 times only to have missed something simple? This sounds way to complicated to achieve a non-blocking nodejs only website.
I asked myself - surely all these tutorials would have not missed on something as important as this! But they did!
I have researched, read, and tested a lot of code - this is my very first time I ask anything on stackoverflow!
Thank you for any assistance.
P.S. One example of the same approach is this: I request a report using jasper, I pass parameters, and with the "delayed ajax response" approach as described above I simply release the port, and in the background a very intensive report is being generated (and this can be very intensive process as a lot of calculations are being performed)..! I really don't see a better approach - any help will be super appreciated!
Thank you for taking the time to read!

I'm sorry to say it, but yes, you have been going around the world 10 times only to have been missing something simple.
It's obvious that your previous knowledge/experience with webservers are from a blocking point of view, and if this was the case, your concerns had been valid.
Node.js is a framework focused around using a single thread to execute code, which means if it does any blocking operations, no one else would be able to get anything done.
There are some operations that can do this in node, like reading/writing to disk. However, most node operations will be asynchronous.
I believe you are familiar with the term, so I won't go into details. What asynchronous operations allows node to do, is to keep this single thread idle as much as possible. By idle I mean open for other work. If your code is fully asynchronous, then handling 4 concurrent users (or even 400) shouldn't be a problem, even for a single thread.
Now, in regards to your initial problem of ports: Once a request is received on a given port, node.js execute whatever code you have written for it, until it encounters an asynchronous operation as soon as that happens, it is available to to pick up more requests on the same port.
The second problem you inquire about, is the database operation. In this case, node-js would send the query to the database (which takes no time at all) and the database does that actual execution of the query. In the meantime, node is free to do whatever it wants, until the database is finished, and lets node know there is a result to fetch.
You can recognize async operations by their structure: my_function(..., ..., callback). Function that uses a callback function, is in most cases asynch.
So bottom line: Don't worry about the problems around blocking IO, as you will hardly encounter any in node. Use a single port if you want (By creating multiple child processes, you can even have multiple node instances on the same port).
Hope this explains it good enough. If you have any further questions, let me know :)

node js on heroku - request timeout issue

I am using Sails js (node js framework) and running it on Heroku and locally.
The API function reads from an external file and performs long computations that might take hours on the queries it read.
My concern is that after a few minutes it returns with timeout.
I have 2 questions:
How to control the HTTP request / response timeout (what do I really need to control here?)
Is HTTP request considered best practice for this target? or should I use Socket IO? (well, I have no experience on Socket IO and not sure if I am not talking bullshit).

You should use the worker pattern to accomplish any work that would take more than a second or so:
"Web servers should focus on serving users as quickly as possible. Any non-trivial work that could slow down your user’s experience should be done asynchronously outside of the web process."
"The Flow
Web and worker processes connect to the same message queue.
A process adds a job to the queue and gets a url.
A worker process receives and starts the job from the queue.
The client can poll the provided url for updates.
On completion, the worker stores results in a database."
https://devcenter.heroku.com/articles/asynchronous-web-worker-model-using-rabbitmq-in-node

Node.js: Handling outgoing HTTP request from node

I've been wrestling with this problem for a while but could not find a good solution for this, so came here for help.
I have a node.js server; on receiving a request from a client, the server will contact 3rd party backend to grab some data, and return it back to the client.
The server to 3rd party backend communication involved multiple calls back and forth, and it typically takes ~3 seconds to finish this process.
If I fire off, say, 50 concurrent requests from test tools like JMeter, the performance degradation becomes severe very quickly, even causing timeouts for some of the later served calls.
Initially I started looking into asyncblock, but since it was running on fiber I wasn't seeing a big improvements in performance, so I started looking into threads.
The only mature module I could find was thread-a-gogo, but I also recently found out that you cannot use the required external modules(like crypto, for example) within threads spawned by TAGG.
Given that there are proxy products built with node.js I believe there is an efficient way to do this but I can't really think of other approaches if threads cannot use external modules.
Any advice would be appreciated.
I can't reveal the full detail due to NDA but here's the basic concept of what I'm doing. I'd like to send below logic to a separate thread.
asyncblock(flow){
var result1 = flow.sync(externalRequest1(flow.callback());
if result1 contains success message
var result2 = flow.sync(externalRequest2(flow.callback());
if result2 contains success message
process result2 and return to client
}

First thing to check and be certain: are you using the node.js core HTTP Agent? If so, you are subject to maxSockets limit of 5 connections to the same server. Read the hyperquest README rant for details.
Secondly, be aware the remote side may impose abuse limitations as well, so check to see if there are issues there or if, for example, overall performance would be better if you used a single pool of 10 connections instead of opening unlimited number of simultaneous connections to the upstream server.

Being both event-driven servers, why node.js needs async code where Nginx doesn't?

The question is in the title. In another words, if Nginx works as the same event-driven async IO model of node.js, why doesn't it requires writing async style code? I know, Nginx is NOT actually executing any code, rather proxying them to who can. Then why doesn't node do so? Are we missing anything in the current Ngninx way? Or, gaining anything more from node (apart from the pain of writing async codes)?
Ps.
To be more specific, how different is Nginx+php-fpm or Nginx+wsgi+python/ruby from node alone regarding performance or utilizing computing resource that node claims? Couldn't node just use existing FastCGI models, be a sync style JavaScript interpreter and let webserver do its async job?

Cross-posted from NodeJS google groups:
Okay i'll try my best to answer your question:
Nginx is a web server that only proxies requests. Now if you take the example of Nginx+php+fpm or Nginx+wsgi+ruby you are having an asynchronous, evented web server sitting in front of webserver that is executing synchronously. So Nginx will accept() as many connections as possible and all of them would be queued. The requests from Nginx to your backend synchronous server would be asynchronous. But your backend synchronous server which also does accept() is not queuing any connections. It can serve only one request at a time (considering you are single threaded) and multiple requests at a time (prefork/fork(slow)/multithreaded -> has its own drawbacks like thread creation time(can be avoided with thread-pools but PITA to implement), context switches, thread deadlocks, number of connections accept()ed can never be greater than number of threads etc)
Imagine you have 2 routes to your backend server that Nginx is hitting:
/404, /login.
If the /login route is doing a lot of I/O and if another request is made to /404, the rendering of the /404 page will depend on the completion of /login's request (because the process is blocked). So basically the response to any request will depend on the request that takes the longest time to do I/O. So even though Nginx is async and evented its response time for any request will depend entirely on that one request that takes the longest time to finish (culprit: the synchronous backend server).
Now if you take the example of NodeJS, everything is asynchronous and evented. Be it File/Network I/O etc. So nothing blocks the process. So taking the previous example, even if /login route is doing a lot of I/O its all asynchronous and /404 page is rendered immediately.
My explanation is quite rudimentary. But I think it should give you more clarity.

nginx is a simple static HTTP and proxy server. Node.js is a full-featured application platform.
Why would you not expect the more specialised application to have abstracted away all the internal workings that you don't need to control directly?
Edit:
Your PS is pretty similar to this question, and is concerned specifically with using Node.JS as an HTTP server. Bear in mind that v0.4.12 had just been released when that question was closed - v0.8.5 is the latest stable release at the moment. The key point anyway is it depends what you're trying to achieve.
This blog post describes a Node.JS-based set-up achieving 250k concurrent connections on a single server. A quick google search shows people attempting similar with nginx+php struggling to reach 100k with far more hardware resources available.

NodeJS - Child node process?

I'm using NodeJS to run a socket server (using socket.io). When a client connects, I want am opening and running a module which does a bunch of stuff. Even though I am careful to try and catch as much as possible, when this module throws an error, it obviously takes down the entire socket server with it.
Is there a way I can separate the two so if the connected clients module script fails, it doesn't necessarily take down the entire server?
I'm assuming this is what child process is for, but the documentation doesn't mention starting other node instances.
I'd obviously need to kill the process if the client disconnected too.

I'm assuming these modules you're talking about are JS code. If so, you might want to try the vm module. This lets you run code in a separate context, and also gives you the ability to do a try / catch around execution of the specific code.
You can run node as a separate process and watch the data go by using spawn, then watch the stderr/stdout/exit events to track any progress. Then kill can be used to kill the process if the client disconnects. You're going to have to map clients and spawned processes though so their disconnect event will trigger the process close properly.
Finally the uncaughtException event can be used as a "catch-all" for any missed exceptions, making it so that the server doesn't get completely killed (signals are a bit of an exception of course).

As the other poster noted, you could leverage the 'vm' module, but as you might be able to tell from the rest of the response, doing so adds significant complexity.
Also, from the 'vm' doc:
Note that running untrusted code is a tricky business requiring great care.
To prevent accidental global variable leakage, vm.runInNewContext is quite
useful, but safely running untrusted code requires a separate process.
While I'm sure you could run a new nodejs instance in a child process, the best practice here is to understand where your application can and will fail, and then program defensively to handle all possible error conditions.
If some part of your code "take(s) down the entire ... server", then you really to understand why this occurred and solve that problem rather than rely on another process to shield you from the work required to design and build a production-quality service.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string