Steps to improve throughput of Node JS server application - node.js

I have a very simple nodejs application that accepts json data (1KB approx.) via POST request body. The response is sent back immediately to the client and the json is posted asynchronously to an Apache Kafka queue. The number of simultaneous requests can go as high as 10000 per second which we are simulating using Apache Jmeter running on three different machines. The target is to achieve an average throughput of less than one second with no failed requests.
On a 4 core machine, the app handles upto 4015 requests per second without any failures. However since the target is 10000 requests per second, we deployed the node app in a clustered environment.
Both clustering in the same machine and clustering between two different machines (as described here) were implemented. Nginx was used as a load balancer to round robin the incoming requests between the two node instances. We expected a significant improvement in the throughput (like documented here) but the results were on the contrary.
The number of successful requests dropped to around 3100 requests per second.
My questions are:
What could have gone wrong in the clustered approach?
Is this even the right way to increase the throughput of Node application?
We also did a similar exercise with a java web application in Tomcat container and it performed as expected 4000 requests with a
single instance and around 5000 successful requests in a cluster
with two instances. This is in contradiction to our belief that
nodejs performs better than a Tomcat. Is tomcat generally better
because of its thread per request model?
Thanks a lot in advance.

Per your request, I'll put my comments into an answer:
Clustering is generally the right approach, but whether or not it helps depends upon where your bottleneck is. You will need to do some measuring and some experiments to determine that. If you are CPU-bound and running on a multi-core computer, then clustering should help significantly. I wonder if your bottleneck is something besides CPU such as networking or other shared I/O or even Nginx? If that's the case, then you need to fix that before you would see the benefits of clustering.
Is tomcat generally better because of its thread per request model?
No. That's not a good generalization. If you are CPU-bound, then threading can help (and so can clustering with nodejs). But, if you are I/O bound, then threads are often more expensive than async I/O like nodejs because of the resource overhead of the threads themselves and the overhead of context switching between threads. Many apps are I/O bound which is one of the reasons node.js can be a very good choice for server design.
I forgot to mention that for http, we are using express instead of the native http provided by node. Hope it does not introduce an overhead to the request handling?
Express is very efficient and should not be the source of any of your issues.

As jfriend said , you need to find the bottlenecks ,
one thing you can try is to reduce the bandwith/throughput by using sockets to pass the json and especially this library https://github.com/uNetworking/uWebSockets.
The main reason for that is that an http request is significantly heavier than a socket connection.
Good Example : https://webcheerz.com/one-million-requests-per-second-node-js/
lastly you can also compress the json via (http gzip) or a third party module.
work on the weight ^^
Hope it helps!

Related

Any benefits when clustering node.js application server in single core computer

This question is meant for single core only therefore multiple cores are out.
I am running Node.js application as HTTP server on single core computer using Express.js. Assuming that my server is able to handle 1000 concurrent requests, would clustering brings about any better in response speed ?
Does process context switch has much impact on performance in this case ?
I wouldn't expect improvements in speed, but you might get some other benefits.
for example if the process crashes, the other nodejs instance could still work.
would clustering brings about any better in response speed ?
You are aware of the fact that in clustered mode, your application will be running behind a load balancer, that will in turn take some CPU and memory to manage and forward the network traffic. Then, what's left of the resources, will be used to distribute the network load.
Apart from a few, rare and easily avoidable cases such as #Cristyan mentions—in which case your load balancer can be an orchestrator managing most of the stuff, like Kubernetes—running a Node.js app in cluster does not make sense to me, on a single CPU core. If the process has to wait for an item, it has to wait for it! Asynchronously you can make it work on other requests, but even in this case, other processes would want to take a share of CPU too.

Can you get more performance using multiple websockets on one Node.js server?

Ok guys, is it useful to generate more than one websocket instance on a Node.js server? I mean possibly you also create subworkers..
I know, it depends on your hardware and the network cards maximum. But can you reach this maximium with only one task or can you get more performance with several parallel processes?
It depends on the CPU count in your server. Parallel processes (the cluster module) aren't a silver bullet for performance - on a 1 CPU system they might even decrease it just due to overhead. But if you have more than one CPU, a NodeJS process CANNOT use that extra horsepower. The cluster module solves this. Used properly you can use all the available CPU power on the system.
Note that this has nothing to do with WebSockets themselves. This is a core NodeJS architectural constraint. But WebSockets are a great use-case where clustering is advantageous...

Hapijs: Performance tuning for lots of concurrent requests

Are there any special tuning tips for strengthening an API built on top of the hapijs framework?
Especially if you have lots of concurrent request (+10000/sec) that are accessing the DB?
I'm using PM2 to start my process in "cluster mode" to be able to load-balance to different cores on the server
I don't need to serve static content, so there's no apache/nginx proxy
update 17:11
Running tests with 1000 requests/sec (with loader.io) results in this curve - ok, so far. but I'm wondering if there is still room for improvements.
(hardware: 64gb / 20 core digital ocean droplet)
In the end I just used a combination of node's http and the body-parser module to achieve what I needed.
But I think that this was only viable, because my application had just two endpoint (one GET, one POST).
If your application logic is rather complicated and you want to stick with hapi, think about using a load-balancer and dividing the load to multiple VMs.
Loadtest results (new setup on an even smaller DO droplet):

Controlling the flow of requests without dropping them - NodeJS

I have a simple nodejs webserver running, it:
Accepts requests
Spawns separate thread to perform background processing
Background thread returns results
App responds to client
Using Apache benchmark "ab -r -n 100 -c 10", performing 100 requests with 10 at a time.
Average response time of 5.6 seconds.
My logic for using nodejs is that is typically quite resource efficient, especially when the bulk of the work is being done by another process. Seems like the most lightweight webserver option for this scenario.
The Problem
With 10 concurrent requests my CPU was maxed out, which is no surprise since there is CPU intensive work going on the background.
Scaling horizontally is an easy thing to, although I want to make the most out of each server for obvious reasons.
So how with nodejs, either raw or some framework, how can one keep that under control as to not go overkill on the CPU.
Potential Approach?
Could accepting the request storing it in a db or some persistent storage and having a separate process that uses an async library to process x at a time?
In your potential approach, you're basically describing a queue. You can store incoming messages (jobs) there and have each process get one job at the time, only getting the next one when processing the previous job has finished. You could spawn a number of processes working in parallel, like an amount equal to the number of cores in your system. Spawning more won't help performance, because multiple processes sharing a core will just run slower. Keeping one core free might be preferred to keep the system responsive for administrative tasks.
Many different queues exist. A node-based one using redis for persistence that seems to be well supported is Kue (I have no personal experience using it). I found a tutorial for building an implementation with Kue here. Depending on the software your environment is running in though, another choice might make more sense.
Good luck and have fun!

Number of threads in a middleware application

I am writing an application server (again, non-related with a question I already posted here) and I am wondering what are the strategies to use when creating worker threads that work on the database. Some preliminary dates: the server receives xml and sends back xml, all the requests query a database - each request could take a few milliseconds to a few seconds.
Say for example that your server services a small to medium number of clients which in turn send a small number of requests per connection. Is it safe to have one worker thread per connection or should it be per request? Also should a thread pool be used to limit the resources used by the server or a worker should be added each time a new connection/request is made?
Should the server limit the number of threads it creates to an upper limit?
Hope I am not too vague ... I can hardly keep my eyes open.
If you don't have extensive experience writing application servers is a daunting task. It can be eased by using frameworks like ACE that allow you to build different configurations of your app serving infrastructure like thread per connection, thread pools, leader follower and then load the appropriate configuration with an extensible service framework.
I would recommend to read these books on ACE to get
C++ Network Programming: Mastering Complexity Using ACE and Patterns
C++ Network Programming: Systematic Reuse with ACE and Frameworks
to get an idea about what the framework can do for you.
The way I write apps like this is to make the number of threads configurable via the command line and/or a configuration file. I then do some load testing with different numbers of threads - there is always an optimal number beyond which performance begins to degrade.
If you follow the model adopted by Java EE app server developers, there's a queue for incoming requests and a pool of worker threads to service them. It's one thread per request. When a worker thread fulfills a request it goes back into the pool. If the incoming requests show up faster than the worker thread pool can service them, the queue allows them to stack up until a worker thread is released. Both the queue size and the thread pool can be tuned to match for your situation.
I'd wonder why anyone would feel the need to write their own server from scratch, especially when the scenario you describe is solved so well by others. If your wish is education, good luck. If you think you're going to improve on what's been done in the past, I'd re-examine that assumption.

Resources