How to achieve maximum concurrency on a Azure Worker role

How to achieve maximum concurrency on a Azure Worker role - azure

Is it possible to find out how many concurrent threads we can start on a worker role considering the current environment we are running on. It seems that we can't really rely on the numbers of cores as the resources sharing is not directly tied to one physical core.
Taking an example of a small instance how many concurrent background worker can I have running simultaneously ( CPU bound off course ! ).
Is there any way we can do that dynamically as well (if we choose to scale up to another type of instance)
Thanks

Related

When is better using clustering or worker_threads?

I have been reading about multi-processing on NodeJS to get the best understanding and try to get a good performance in heavy environments with my code.
Although I understand the basic purpose and concept for the different ways to take profit of the resources to handle the load, some questions arise as I go deeper and it seems I can't find the particular answers in the documentation.
NodeJS in a single thread:
NodeJS runs a single thread that we call event loop, despite in background OS and Libuv are handling the default worker pool for I/O asynchronous tasks.
We are supossed to use a single core for the event-loop, despite the workers might be using different cores. I guess they are sorted in the end by OS scheduler.
NodeJS as multi-threaded:
When using "worker_threads" library, in the same single process, different instances of v8/Libuv are running for each thread. Thus, they share the same context and communicate among threads with "message port" and the rest of the API.
Each worker thread runs its Event loop thread. Threads are supposed to be wisely balanced among CPU cores, improving the performance. I guess they are sorted in the end by OS scheduler.
Question 1: When a worker uses I/O default worker pool, are the very same
threads as other workers' pool being shared somehow? or each worker has its
own default worker pool?
NodeJS in multi-processing:
When using "cluster" library, we are splitting the work among different processes. Each process is set on a different core to balance the load... well, the main event loop is what in the end is set in a different core, so it doesn't share core with another heavy event loop. Sounds smart to do it that way.
Here I would communicate with some IPC tactic.
Question 2: And the default worker pool for this NodeJS process? where
are they? balanced among the rest of cores as expected in the first
case? Then they might be on the same cores as the other worker pools
of the cluster I guess. Shouldn't it be better to say that we are balancing main threads (event loops) rather than "the process"?
Being all this said, the main question:
Question 3: Whether is better using clustering or worker_threads? If both are being used in the same code, how can both libraries agree the best performance? or they
just can simply get in conflict? or at the end is the OS who takes
control?

Each worker thread has its own main loop (libuv etc). So does each cloned Node.js process when you use clustering.
Clustering is a way to load-balance incoming requests to your Node.js server over several copies of that server.
Worker threads are a way for a single Node.js process to offload long-running functions to a separate thread, to avoid blocking its own main loop.
Which is better? It depends on the problem you're solving. Worker threads are for long-running functions. Clustering makes a server able to handle more requests, by handling them in parallel. You can use both if you need to: have each Node.js cluster process use a worker thread for long-running functions.
As a first approximation for your decision-making: only use worker threads when you know you have long-running functions.
The node processes (whether from clustering or worker threads) don't get tied to specific cores (or Intel processor threads) on the host machine; the host's OS scheduling assigns cores as needed. The host OS scheduler minimize context-switch overhead when assigning cores to runnable processes. If you have too many active Javascript instances (cluster instances + worker threads) the host OS will give them timeslices according to its scheduling algorithms. Other than avoiding too many Javascript instances, there's very little point in trying second-guess the OS scheduler.
Edit Each Node.js instance, with any worker threads, uses a single libuv thread pool. A main Node.js process shares a single libuv thread pool with all its worker threads. If your Node.js program uses many worker threads, you may, or may not, need to set the UV_THREADPOOL_SIZE environment variable to a value greater than the default 4.
Node.js's cluster functionality uses the underlying OS's fork/exec scheme to create a new OS process for each cluster instance. So, each cluster instance has its own libuv pool.
If you're running stuff at scale, lets say with more than ten host machines running your Node.js server, then you can spend time optimizing Javascript instances.
Don't forget nginx if you use it as a reverse proxy to handle your https work. It needs some processor time too, but it uses fine-grain multithreading so you won't have to worry about it unless you have huge traffic.

Tuning of Server Thread pool

I was just playing with threads and see how much CPU they consume. I have checked in two scenarios.
In first scenario I created four threads and started them with infinite loop. Soon those threads consumed my all 4 CPU cores. After checking performance monitor in task manager I found CPU consumption is 100%.
In second scenario when I tried it with web application and in rest controller(using tomcat server 8.5 version) I have run infinite loop. So that if I request url 4 times with browser(with different tabs obviously). My CPU consumption should be 100%. I couldn't see 100% CPU consumption.
Why is there difference?
My Second question is: how would I tune the server thread pool. I have to use more than 4 threads because it might be possible few of them are waiting for IO operation. I am using hibernate as ORM that maintains connection pooling. So how many threads I should use in thread pool as well as connection pool. How would I decide?

We can't answer the first part of your question without seeing your code. But I suspect the problem is in the way that you have implemented the threads in the webapp case. (Because what you report shouldn't happen ...)
The answer to the second part is "trial and error". More specifically:
Make the pool sizes tunable parameters
Develop a benchmark that is representative of your expected system load.
Run benchmark with different settings, measure performance and graph results.
Based on the graph (and other criteria) pick a settings that are the best compromise between performance and resource (e.g. memory) utilization.
Thread pools and connection pools are different, and have different resource implications. The first is (largely) about memory; i.e. thread stacks and temporary objects used by the threads while they are active. The second is (largely) about resources associated with connections (active or idle).

Clustering Node.js on Bluemix

Will a Node.js app on Bluemix automatically be scaled to run on multiple processors, or do I need to implement that myself using Node's clustering API? And if I do use clustering, will there be more than one CPU available?

Short answer: You need to use node cluster module to take full advantage of all cores in each instance. Or, you can also just increase the number of instances.
Long answer: Each instance of your application that you push to bluemix runs in a warden container. Resource control is managed by linux cgroups. The number of cores per instance is not something you can control. Running a quick test on Bluemix, os.cpus() showed 4 cores. If you want to take advantage of all 4 cores, in your one Bluemix instance (warden container) of your node.js application, then you should use nodes cluster module.
Keep in mind, you can also just increase the number of instances (horizontal scaling), which could achieve near linear results depending on your bottleneck on use of external services. So if you have 3 instances, each of those instances has 4 cores, and the built-in load balancer distributes traffic among the 3 instances.

The hybrid model that Ram suggested makes sense. You might want to do some benchmark to determine how many processes you want to run in one application container. You can use "cf app " to monitor the CPU utilization of each app instances under load, and if it's not fully consuming the CPU then it may make sense to spawn more processes.
However, please note -
* CPU might not be the bottleneck, in which case spawn more processes in the app container or scaling more app container instances won't help;
* The more processes you spawn in one container, the more memory it consumes, so make sure you do not spawn too many and exceed the allocated memory number (otherwise the app container will be killed).

Advantages of Azure WorkerRole vs. starting a new thread

Can someone please summarize the advantages of creating an Azure WokerRole vs. simply starting a new thread?

By starting a new worker role instance you have all of the memory and CPU available to that instance size vs. when creating threads you'd be sharing the resources of one role for that instance size.
I would say that it also depends on what you're processing. Also, I think that threading or any parallel processing only makes sense when you're using a Medium instance and up where you have 2 or more cores.

The primary advantages IMHO are that you create a seperation of concerns as well as the ability to dependently scale the capacity of the background process and front end.

I assume you mean starting a new thread from an IIS-hosted service/app in a WebRole. My main concern would be recycling of IIS app pools and memory consumption.
Depending on the type of application, load on your application and IIS settings you don't have a lot of control over the lifecycle and resources of the process your thread will be living in.

IIS, multiple CPU cores, application pools and worker processes - best configuration for a single site?

We use Kentico CMS and I've exchanged emails with them about a web garden deployment.
We have a single site running on a server with 8 cpu cores. In line with Kentico's advice, we have not altered the application pool web garden setting from the default i.e. it is set to a maximum number of worker processes of 1.
Our experience is that the site only uses one of the cpu cores - the others are idling. When I emailed them about this, their response was that the OS/IIS would handle this and use other cores as necessary even though the application pool only has a single worker process.
Now, I've a lot of respect for the guys at Kentico, but this doesn't seem right to me?
Surely, if we want to use all cores, we need to permit eight worker processes (and implement session state storage in SQL server)?
Many thanks
Tony

I would suggest running perfmon for a 24 hours and see if you can determine what resources are being used. Indeed they might already be running on all cores . . . Also, if their web app is a heavily threaded system, then it will take full advantage of multiple cores(at least ours does). Threads, not worker processes, are what actually count for processor utilization.

Not sure if you got an answer on ServerFault, at any rate ASP.NET is multi-threaded and in a single worker process there are several threads, each serving a single request.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string