Fork NodeJS clusters as working load changes - node.js

I am trying to fork worker clusters to a maximun of 10, and only if the working load increases. Can it be done?
I have tried with strong-cluster-control's setSize, but I can't find an easy way of forking automatically (if many requests are being done then fork, for example), or closing/"suiciding" forks (maybe with a timeOut if nothing is being done, like in this answer)
This is my repo's main file at GitHub
Thank you in advance!!

I assume that you already have some idea as to how you would like to spread your load so I will not include details about that and instead focus on the interprocess communication required for this.
Notifying the master
To send arbitrary data to the master, you can use process.send() from a worker. The way I would go about this is probably something along these steps:
The application is started
Minimum amount of workers are spawned
Each worker will send the master a request message every time it receives a new request, via process.send()
The master keeps track of all the request events from all workers
If the amount of request events increases above a predefined threshold (i.e. > 100 requests/s) it spawns a new worker
If the amount of request events decreases below a predefined threshold it asks one of the workers to stop processing new requests and close itself gracefully (note that it should not simply kill the process to avoid interrupting ongoing requests)
Main point is: Do not focus on time - focus on rate. In an application that is supposed to handle tens to thousands of requests per second, your setTimout() (the task of which might be to kill the worker if it has been idle for too long) will never fire because Node.js evenly distributes your load across your workers - you could start with one worker, but once you reach your maximum you will never drop to one worker again under continuous load even if there is only one request per second.
It should be noted that it is counterproductive to spawn more workers than the amount of CPU cores you have at your disposal. It might, however, be beneficial to start with a single worker and incrementally increase the amount to all cores as load increases.

Related

Ktor, Netty and increasing the number of threads per endpoint

Using Ktor and Kotlin 1.5 to implement a REST service backed by Netty. A couple of things about this service:
"Work" takes non-trivial amount of time to complete.
A unique client endpoint sends multiple requests in parallel to this service.
There are only a handful of unique client endpoints.
The service is not scaling as expected. We ran a load test with parallel requests coming from a single client and we noticed that we only have two threads on the server actually processing the requests. It's not a resource starvation problem - there is plenty of network, memory, CPU, etc. and it doesn't matter how many requests we fire up in parallel - it's always two threads keeping busy, while the others are sitting idle.
Is there a parameter we can configure to increase the number of threads available to process requests for specific endpoints?
Netty use what is called Non-blocking IO model (http://tutorials.jenkov.com/java-concurrency/single-threaded-concurrency.html).
In this case you have only a single thread and it can handle a lot of sub-processes in parallel, as long as you follow best practices (not blocking the main thread event loop).
You might need to check the following configuration options for Netty https://ktor.io/docs/engines.html#configure-engine
connectionGroupSize = x
workerGroupSize = y
callGroupSize = z
Default values usually are set rather low and tweaking them could be useful for the time-consuming 'work'. The exact values might vary depending on the available resources.

How to configure Node.js PM2 to distribute work evenly among workers?

I have a situation where I create a Node.js cluster using PM2. A single request fired at a worker would take considerable time (2+ mins) as it's doing intensive computations (in a pipeline of steps) with a couple of I/O operations at different stages (step 1 is 'download over HTTP', an intermediate and last step are 'write to disk'). The client that sends requests to the cluster throttle the requests it sends by two factors:
Frequency (how many requests per second), we use a slow pace (1 per second)
How many open requests it can make, we make this less than or equal to the number nodes we have in the cluster
For example, if the cluster is 10 nodes, then the client will only send 10 requests to the cluster at a speed of 1 per second, and won't send any more requests until one or more requests returns with either success or failure, which means that a worker or more should be free now to do more work, then the client will send more work to the cluster.
While watching the load on the server, it seems that the load balancer does not distribute work evenly as one would expect from a classic round-robin distribution schema. What happens is that a single worker (usually the 1st one) will receive a lot of requests while there're free workers in the cluster. This eventually will cash the worker to malfunction.
We implemented a mechanism to prevent a worker from proceeding with new requests if it's still working on a previous one. This prevented malfunctioning, but still, a lot of requests are denied service although the cluster has vacant workers!
Can you think of the reason why this behavior is happening, or how can improve the way PM2 does work?

How to manage Managed Executor Service

I'm using Managed Executor Service to implement a process manager which will process tasks in the background upon receiving an JMS message event. Normally, there will be a small number of tasks running (maybe 10 max) but what if something happens and my application starts getting hundred of JMS message events. How do I handle such event?
My thought is to limit the number of threads if possible and save all the other messages to database and will be run when thread available. Thanks in advance.
My thought is to limit the number of threads if possible and save all the other messages to database and will be run when thread available.
The detailed answer to this question depends on which Java EE app server you choose to run on, since they all have slightly different configuration.
Any Java EE app server will allow you to configure the thread pool size of your Managed Executor Service (MES), this is the number of worker threads for your thread pool.
Say you have a 10 worker threads, and you get flooded with 100 requests all at once, the MES will keep a queue of requests that are backlogged, and the worker threads will take work off the queue whenever they finish work until the queue is empty.
Now, it's fine if work goes to the queue sometimes but if overall your work queue increases more quickly than your worker threads can take work off the queue, you will run into problems. The solution to this is to increase your thread pool size otherwise the backlog will get overrun and your server will run out of memory.
what if something happens and my application starts getting hundred of JMS message events. How do I handle such event?
If the load on your server will be so sporadic that tasks need to be saved to a database, it seems that the best approach would be to either:
increase thread pool size
have the server immediately reject incoming tasks when the task backlog queue is full
have clients do a blocking wait for the server task queue to be not full (I would only advise this option if client task submission is in no way connected to user experience)

Tomcat thread control

I have two tomcat servers running at the same time. I have reports which are requested from server 1 sent to server 2 for processing. So how would I go about managing the threads on server 2? For example, if I wanted to queue up the threads how would I go about doing that?
Use a message queue (like RabbitMQ) in the middle to queue up the tasks that need to be done.
Then, your report generating server can pull jobs from the queue and work on them. If you need to slow down or speed up, then you can increase the number of "workers" running.

Is it acceptable to use ThreadPool.GetAvailableThreads to throttle the amount of work a service performs?

I have a service which polls a queue very quickly to check for more 'work' which needs to be done. There is always more more work in the queue than a single worker can handle. I want to make sure a single worker doesn't grab too much work when the service is already at max capacity.
Let say my worker grabs 10 messages from the queue every N(ms) and uses the Parallel Library to process each message in parallel on different threads. The work itself is very IO heavy. Many SQL Server queries and even Azure Table storage (http requests) are made for a single unit of work.
Is using the TheadPool.GetAvailableThreads() the proper way to throttle how much work the service is allowed to grab?
I see that I have access to available WorkerThreads and CompletionPortThreads. For an IO heavy process, is it more appropriate to look at how many CompletionPortThreads are available? I believe 1000 is the number made available per process regardless of cpu count.
Update - Might be important to know that the queue I'm working with is an Azure Queue. So, each request to check for messages is made as an async http request which returns with the next 10 messages. (and costs money)
I don't think using IO completion ports is a good way to work out how much to grab.
I assume that the ideal situation is where you run out of work just as the next set arrives, so you've never got more backlog than you can reasonably handle.
Why not keep track of how long it takes to process a job and how long it takes to fetch jobs, and adjust the amount of work fetched each time based on that, with suitable minimum/maximum values to stop things going crazy if you have a few really cheap or really expensive jobs?
You'll also want to work out a reasonable optimum degree of parallelization - it's not clear to me whether it's really IO-heavy, or whether it's just "asynchronous request heavy", i.e. you spend a lot of time just waiting for the responses to complicated queries which in themselves are cheap for the resources of your service.
I've been working virtually the same problem in the same environment. I ended up giving each WorkerRole an internal work queue, implemented as a BlockingCollection<>. There's a single thread that monitors that queue - when the number of items gets low it requests more items from the Azure queue. It always requests the maximum number of items, 32, to cut down costs. It also has automatic backoff in the event that the queue is empty.
Then I have a set of worker threads that I started myself. They sit in a loop, pulling items off the internal work queue. The number of worker threads is my main way to optimize the load, so I've got that set up as an option in the .cscfg file. I'm currently running 35 threads/worker, but that number will depend on your situation.
I tried using TPL to manage the work, but I found it more difficult to manage the load. Sometimes TPL would under-parallelize and the machine would be bored, other times it would over-parallelize and the Azure queue message visibility would expire while the item was still being worked.
This may not be the optimal solution, but it seems to be working OK for me.
I decided to keep an internal counter of how many message are currently being processed. I used Interlocked.Increment/Decrement to manage the counter in a thread-safe manner.
I would have used the Semaphore class since each message is tied to its own Thread but wasn't able to due to the async nature of the queue poller and the code which spawned the threads.

Resources