Kubernetes NodeJS consuming more than 1 CPU? - node.js

We are running a single NodeJS instance in a Pod with a request of 1 CPU, and no limit. Upon load testing, we observed the following:
NAME CPU(cores) MEMORY(bytes)
backend-deployment-5d6d4c978-5qvsh 3346m 103Mi
backend-deployment-5d6d4c978-94d2z 3206m 99Mi
If NodeJS is only running a single thread, how could it be consuming more than 1000m CPU, when running directly on a Node it would only utilize a single core? Is kubernetes somehow letting it borrow time across cores?

Although Node.js runs the main application code in a single thread, the Node.js runtime is multi-threaded. Node.js has an internal worker pool that is used to run background tasks, including I/O and certain CPU-intensive processing like crypto functions. In addition, if you use the worker_threads facility (not to be confused with the worker pool), then you would be directly accessing additional threads in Node.js.

Related

Node Worker Threads vs Heroku Workers

I'm trying to understand difference between Node Worker Threads vs Heroku Workers.
We have a single Dyno for our main API running Express.
Would it make sense to have a separate worker Dyno for our intensive tasks such as processing a large file.
worker: npm run worker
Some files we process are up to 20mb and some processes take longer than the 30s limit to run so kills the connection before it comes back.
Then could I add Node Worker Threads in the worker app to create child processes to handle the requests or is the Heroku worker enough on its own?
After digging much deeper into this and successfully implementing workers to solve the original issue, here is a summary for anyone who comes across the same scenario.
Node worker threads and Heroku workers are similar in that they intend to run code on separate threads in Node that do not block the main thread. How you use and implement them differs and depends on use case.
Node worker threads
These are the new way to create clustered environments on NODE. You can follow the NODE docs to create workers or use something like microjob to make it much easier to setup and run separate NODE threads for specific tasks.
https://github.com/wilk/microjob
This works great and will be much more efficient as they will run on separate worker threads preventing I/O blocking.
Using worker threads on Heroku on a Web process did not solve my problem as the Web process still times out after a query hits 30s.
Important difference: Heroku Workers Do not!
Heroku Workers
These are separate virtual Dyno containers on Heroku within a single App. They are separate processes that run without all the overhead the Web process runs, such as http.
Workers do not listen to HTTP requests. If you are using Express with NODE you need a web process to handle incoming http requests and then a Worker to handle the jobs.
The challenge was working out how to communicate between the web and worker processes. This is done using Redis and Bull Query together to store data and send messages between the processes.
Finally, Throng makes it easier to create a clustered environment using a Procfile, so it is ideal for use with Heroku!
Here is a perfect example that implements all of the above in a starter project that Heroku has made available.
https://devcenter.heroku.com/articles/node-redis-workers
It may make more sense for you to keep a single dyno and scale it up, which means multiple instances will be running in parallel.
See https://devcenter.heroku.com/articles/scaling

When is better using clustering or worker_threads?

I have been reading about multi-processing on NodeJS to get the best understanding and try to get a good performance in heavy environments with my code.
Although I understand the basic purpose and concept for the different ways to take profit of the resources to handle the load, some questions arise as I go deeper and it seems I can't find the particular answers in the documentation.
NodeJS in a single thread:
NodeJS runs a single thread that we call event loop, despite in background OS and Libuv are handling the default worker pool for I/O asynchronous tasks.
We are supossed to use a single core for the event-loop, despite the workers might be using different cores. I guess they are sorted in the end by OS scheduler.
NodeJS as multi-threaded:
When using "worker_threads" library, in the same single process, different instances of v8/Libuv are running for each thread. Thus, they share the same context and communicate among threads with "message port" and the rest of the API.
Each worker thread runs its Event loop thread. Threads are supposed to be wisely balanced among CPU cores, improving the performance. I guess they are sorted in the end by OS scheduler.
Question 1: When a worker uses I/O default worker pool, are the very same
threads as other workers' pool being shared somehow? or each worker has its
own default worker pool?
NodeJS in multi-processing:
When using "cluster" library, we are splitting the work among different processes. Each process is set on a different core to balance the load... well, the main event loop is what in the end is set in a different core, so it doesn't share core with another heavy event loop. Sounds smart to do it that way.
Here I would communicate with some IPC tactic.
Question 2: And the default worker pool for this NodeJS process? where
are they? balanced among the rest of cores as expected in the first
case? Then they might be on the same cores as the other worker pools
of the cluster I guess. Shouldn't it be better to say that we are balancing main threads (event loops) rather than "the process"?
Being all this said, the main question:
Question 3: Whether is better using clustering or worker_threads? If both are being used in the same code, how can both libraries agree the best performance? or they
just can simply get in conflict? or at the end is the OS who takes
control?
Each worker thread has its own main loop (libuv etc). So does each cloned Node.js process when you use clustering.
Clustering is a way to load-balance incoming requests to your Node.js server over several copies of that server.
Worker threads are a way for a single Node.js process to offload long-running functions to a separate thread, to avoid blocking its own main loop.
Which is better? It depends on the problem you're solving. Worker threads are for long-running functions. Clustering makes a server able to handle more requests, by handling them in parallel. You can use both if you need to: have each Node.js cluster process use a worker thread for long-running functions.
As a first approximation for your decision-making: only use worker threads when you know you have long-running functions.
The node processes (whether from clustering or worker threads) don't get tied to specific cores (or Intel processor threads) on the host machine; the host's OS scheduling assigns cores as needed. The host OS scheduler minimize context-switch overhead when assigning cores to runnable processes. If you have too many active Javascript instances (cluster instances + worker threads) the host OS will give them timeslices according to its scheduling algorithms. Other than avoiding too many Javascript instances, there's very little point in trying second-guess the OS scheduler.
Edit Each Node.js instance, with any worker threads, uses a single libuv thread pool. A main Node.js process shares a single libuv thread pool with all its worker threads. If your Node.js program uses many worker threads, you may, or may not, need to set the UV_THREADPOOL_SIZE environment variable to a value greater than the default 4.
Node.js's cluster functionality uses the underlying OS's fork/exec scheme to create a new OS process for each cluster instance. So, each cluster instance has its own libuv pool.
If you're running stuff at scale, lets say with more than ten host machines running your Node.js server, then you can spend time optimizing Javascript instances.
Don't forget nginx if you use it as a reverse proxy to handle your https work. It needs some processor time too, but it uses fine-grain multithreading so you won't have to worry about it unless you have huge traffic.

Does NodeJS require a multi cores VPS

I want to develop a website with Nuxt.js or Next.js in 1 core CPU 2.4Ghz, 1GB RAM.
Can my website run fast as a start?
How many requests per seconds will be available maybe?
Whether a Node application benefits from multiple cores is application dependent.
Generally, if the child process or cluster modules are not involved,
then there is no need to have multiple cores on your system because Node.js will only use one core as the request handler always runs on the same event loop, which runs on a single thread.
How to achieve process concurrency and high throughput:
Because JavaScript execution in Node.js is single-threaded, so a good rule of thumb for keeping your Node server speedy: is to avoid blocking the event loop. You can read about this in the official documentation in my reference below.
Simple Illustration:
Consider a case where each request to a web server takes 50ms to complete and 45ms of that 50ms is database I/O that can be done asynchronously.
Choosing non-blocking asynchronous operations frees up that 45ms per request to handle other requests.
This is a significant difference in your application capacity and processing speed just by choosing to use non-blocking methods instead of blocking methods.
Reference:
https://nodejs.org/en/docs/guides/dont-block-the-event-loop/
https://nodejs.org/en/docs/guides/blocking-vs-non-blocking/
I hope this helps.

How do 'cluster' and 'worker_threads' work in Node.js?

Did I understand correctly: If I use cluster package, does it mean that
a new node instance is created for each created worker?
What is the difference between cluster and worker_threads packages?
Effectively what you are differing is process based vs thread based. Threads share memory (e.g. SharedArrayBuffer) whereas processes don't. Essentially they are the same thing categorically.
cluster
One process is launched on each CPU and can communicate via IPC.
Each process has it's own memory with it's own Node (v8) instance. Creating tons of them may create memory issues.
Great for spawning many HTTP servers that share the same port b/c the master main process will multiplex the requests to the child processes.
worker threads
One process total
Creates multiple threads with each thread having one Node instance (one event loop, one JS engine). Most Node API's are available to each thread except a few. So essentially Node is embedding itself and creating a new thread.
Shares memory with other threads (e.g. SharedArrayBuffer)
Great for CPU intensive tasks like processing data or accessing the file system. Because NodeJS is single threaded, synchronous tasks can be made more efficient with workers

NodeJS in MultiCore System

"Node.js is limited to a single thread". how the nodeJS will react when we are deploying in Multi-Core systems? will it boost the performance?
The JavaScript running in the Node.js V8 engine is single-threaded, but the underlying libuv multi-platform support library is multi-threaded and those threads will be distributed across the CPU cores by the operating system according to it's scheduling algorithm, so with your JavaScript application running asynchronously (and single-threaded) at the top level, you still benefit from multi-core under the covers.
As others have mentioned, the Node.js Cluster module is an excellent way to exploit multi-core for concurrency at the application (JavaScript V8) level, and since Express is cluster aware, you can have multiple worker processes executing concurrent server logic, without needing a unique listening port for each process. Impressive.
As others have mentioned, you will need Redis or equivalent to share data among the cluster worker processes. You will also want a logging facility that is cluster aware, so the cluster master and all worker processes can log to a single shared log file. The Node log4node module is a good choice here, and it works with logrotate.
Typical web examples show using the runtime detected number of cores as the number of cluster worker processes to fork, but I prefer to make that a configuration option in a config.yaml file so I can tune the number of worker processes running the main JavaScript application as needed.
Nodejs runs in one thread, but you can start multiple nodejs processes.
If you are, for example, building web server you can route every request to one of nodejs processes.
Edit: As hereandnow78 and vkurchatkin suggested, maybe the best way to use power of multi core system would be to use nodejs cluster module
cluster module is the solution.
But u need to know that, node.js cluster is, it invokes child process. It means each process cannot share the data.
To share data, u need to use Redis or other IMDG to share the data across the cluster nodes.

Resources