Azure Batch task to thread ratio - multithreading

I am developing a .NET core application with the Azure Batch SDK. When it comes to creating the tasks I am wondering how many threads my task should consist of.
First I started to evaluate the number of CPU cores of the underlying node and spawn that amount of threads.
Later then, I realized it might be better to let the Azure Batch Scheduler do that work for me, tweak the pools requiredSlots and taskSlotsPerNode settings instead and only create single-threaded worker tasks.
Unfortunately, I couldn't find any advice in the documentation on the Azure Batch task to thread (TPL task) ratio.
Any advice on this?

There's no general answer to this question because it depends on your program and scenario. You should determine how best to run your workload and what important properties you want to leverage independent of Batch first: multi-threaded as a single program or multiple single-threaded programs?
Once you determine your approach, then apply that to Batch using the knobs the Batch API provides you.

Related

Is it possible to run different tasks on different schedules with prefect?

I'm moving my first steps with prefect, and I'm trying to see what its degrees of freedom are. To this end, I'm investigating whether prefect supports running different tasks on different schedules in the same python process. For example, Task A might have to run every 5 minutes, while Task B might run twice a day with a Cron scheduler.
It seems to me that schedules are associated with a Flow, not with a task, so to do the above, one would have to create two distinct one-task Flows, each with its own schedule. But even as that, given that running a flow is a blocking operation, I can't see how to "start" both flows concurrently (or pseudo-concurrently, I'm perfectly aware the flows won't execute on separate threads).
Is there a built-in way of getting the tasks running on their independent schedules? I'm under the impression that there is a way to achieve this, but given my limited experience with prefect, I'm completely missing it.
Many thanks in advance for any pointers.
You are right that schedules are associated with Flows and not Tasks, so the only place to add a schedule is a Flow. Running a Flow is a blocking operation if you are using the open source Prefect core only. For production use cases, it's recommended running your Flows against Prefect Cloud or Prefect Server. Cloud is the managed offering and Server is when you host it yourself. Note that Cloud has a very generous free tier.
When using a backend, you will use an agent that will kick off the flow run in a new process. This will not be blocking.
To start with using a backend, you can check the docs here
This Prefect Discourse topic discusses a very similar problem and shows how you could solve it using a flow-of-flows orchestrator pattern.
One way to approach it is to leverage Caching to avoid recomputation of certain tasks that require lower-frequency scheduling than the main flow.

What does it means, Spark’s internal Fair Scheduler lets long-lived applications define queues

I am trying to understand Spark's Job Scheduling and got this point in the Learning Spark,
"Spark provides a mechanism through configurable intra-application
scheduling policies. Spark’s internal Fair Scheduler lets long-lived
applications define queues for prioritizing scheduling of tasks"
Could you please give me bit more details on this?
As described in a fair scheduler documentation you can maintain multiple pools, each with its own scheduling policy, minimal (minShare) and relative (weight) resource allocation. The last one is described as follows:
If you give a specific pool a weight of 2, for example, it will get 2x more resources as other active pools. Setting a high weight such as 1000 also makes it possible to implement priority between pools—in essence, the weight-1000 pool will always get to launch tasks first whenever it has jobs active.

Monitor multiple thread performance

I have create a windows service having multiple threads (approx 4-5 threads). In this service thread created at specific internal and abort. Once thread is created it performs some I/O operations & db operation.
I have a GUI for this service to provide configuration which is required by this service. In this GUI I want to add one more functionality which shows me the performance of windows service with respect to all threads. I want show CPU utilization (if multicore processor is available than all the processors utilization) with its memory utilization.
If you look at Windows Task Manager it shows CPU (Per core basis) + Memory Utilization, I want to build the same thing but only for threads running by my windows service.
Can anybody help me out how to get CPU% and memory utilization per thread?
I think you cannot get the CPU and Memory utilization of Threads. Instead you can get the same for your service.
My question is, why would you require to build your own functionality, where SysInternals Process explorer gives more details for you? Any specific needs?
If you need to monitor the thread activities, you could better log some information using Log4net or other logging tools. This will get you an idea about the threads and what they are doing.
To be more specific, you could publish the logs using TelNetAppender, which can be received by your application. This will help you to look into the Process in real time.

using .NET 4 Tasks instead of Thread.QueueUserWorkItem

I've been reading bunch of articles regarding new TPL in .NET 4. Most of them recommend using Tasks as a replacement for Thread.QueueUserWorkItem. But from what I understand, tasks are not threads. So what happens in the following scenario where I want to use Producer/Consumer queue using new BlockingCollection class in .NET 4:
Queue is initialized with a parameter (say 100) to indicate number of worker tasks. Task.Factory.StartNew() is called to create a bunch of tasks.
Then new work item is added to the queue, the consumer takes this task and executes it.
Now based on the above, there is seems to be a limit of how many tasks you can execute at the same time, while using Thread.QueueUserWorkItem, CLR will use thread pool with default pool size.
Basically what I'm trying to do is figure out is using Tasks with BlockingCollection is appropriate in a scenario where I want to create a Windows service that polls a database for jobs that are ready to be run. If job is ready to be executed, the timer in Windows service (my only producer) will add a new work item to the queue where the work will then be picked up and executed by a worker task.
Does it make sense to use Producer/Consumer queue in this case? And what about number of worker tasks?
I am not sure about whether using the Producer/Consumer queue is the best pattern to use but with respect to the threads issue.
As I believe it. The .NET4 Tasks still run as thread however you do not have to worry about the scheduling of these threads as the .NET4 provides a nice interface to it.
The main advantages of using tasks are:
That you can queue as many of these up as you want with out having the overhead of 1M of memory for each queued workitem that you pass to Thread.QueueUserWorkItem.
It will also manages which threads and processors your tasks will run on to improve data flow and caching.
You can build in a hierarchy of dependancies for your tasks.
It will automatically use as many of the cores avaliable on your machine as possible.

MSMQ for Managing Threads?

I am building an application where I have inputs from printers over the network (on specific ports) and other files which are created into a folder locally or through the network. The user can create different threads to monitor different folders at the same time, as well as threads to handle the input from threes printers over the network. The application is supposed to process the input data according to its type and output it. On the other end of the application, there would be 4 threads waiting for input data from the input threads (could be 10 or 20 threads) to process and apply 4 different tasks.
As we will have many threads running at the same time, I thought I would use MSMQ to manage these threads. Does using MSMQ fit in this scenario or should I use another technique? Managing these threads in terms of scheduling, prioritizing, etc.
(P.S: I was thinking to build my own ThreadEngine class that will take care of all of these things until I heard about MSMQ, which am still not sure if it’s the right thing to use)
MSMQ would be useful for managing your input/output data not for your threads. .Net already has the ThreadPool, the CCR and the TPL to assist you with concurrency and multithreading so I would suggest reading up on those technologies and choosing the most appropriate one.
MSMQ is a system message queue, not a thread pool manager.
This could be interesting in a case where you don't really mind poor performance and are really going for a system where tasks are persistent and transactional to guarantee execution.
If you are looking for performance then I agree with other folks and highly discourage you from doing this - even with non-durable (ram queues).

Resources