Azure Batch microtask parallel processing (Modify Task Queue)

Azure Batch microtask parallel processing (Modify Task Queue) - azure

I am trying to parallelise micro-tasks that will be fired off on one of the VMs and should get parallelised on all the VMs. How can I modify the Azure Batch queue. Is there any way to add a task to the queue through the API?

Is there any way to add a task to the queue through the API?
If you use Azure Batch .NET Library, you could add a task to a job using following code.
private static async Task<List<CloudTask>> AddTasksAsync(
BatchClient batchClient,
string jobId,
string taskId,
List<ResourceFile> inputFiles,
string taskCommand)
{
// Create a collection to hold the tasks that we'll be adding to the job
List<CloudTask> tasks = new List<CloudTask>();
CloudTask task = new CloudTask(taskId, taskCommand);
task.ResourceFiles = inputFiles;
tasks.Add(task);
await batchClient.JobOperations.AddTaskAsync(jobId, tasks);
return tasks;
}
If you want to use REST API, link below is for your reference.
Add a task to a job
If you encounter any problem when using the APIs above, please feel free to let me know.

Related

Waiting for an azure function durable orchestration to complete

Currently working on a project where I'm using the storage queue to pick up items for processing. The Storage Queue triggered function is picking up the item from the queue and starts a durable orchestration. Normally the according to the documentation the storage queue picks up 16 messages (by default) in parallel for processing (https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-queue), but since the orchestration is just being started (simple and quick process), in case I have a lot of messages in the queue I will end up with a lot of orchestrations running at the same time. I would like to be able to start the orchestration and wait for it to complete before the next batch of messages are being picked up for processing in order to avoid overloading my systems. The solution I came up with and seems to work is:
public class QueueTrigger
{
[FunctionName(nameof(QueueTrigger))]
public async Task Run([QueueTrigger("queue-processing-test", Connection = "AzureWebJobsStorage")]Activity activity, [DurableClient] IDurableOrchestrationClient starter,
ILogger log)
{
log.LogInformation($"C# Queue trigger function processed: {activity.ActivityId}");
string instanceId = await starter.StartNewAsync<Activity>(nameof(ActivityProcessingOrchestrator), activity);
log.LogInformation($"Started orchestration with ID = '{instanceId}'.");
var status = await starter.GetStatusAsync(instanceId);
do
{
status = await starter.GetStatusAsync(instanceId);
} while (status.RuntimeStatus == OrchestrationRuntimeStatus.Running || status.RuntimeStatus == OrchestrationRuntimeStatus.Pending);
}
which basically picks up the message, starts the orchestration and then in a do/while loop waits while the staus is Pending or Running.
Am I missing something here or is there any better way of doing this (I could not find much online).
Thanks in advance your comments or suggestions!

This might not work since you could either hit timeouts causing duplicate orchestration runs or just force your function app to scale out defeating the purpose of your code all together.
Instead, you could rely on the concurrency throttles that Durable Functions come with. While the queue trigger would queue up orchestrations runs, only the max defined would run at any time on a single instance of a function.
This would still cause your function app to scale out, so you would have to consider that as well when setting this limit and you could also set the WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT app setting to control how many instances you function app can scale out to.

It could be that the Function app's built in scaling throttling does not reduce load on downstream services because it is per app and will just cause the app to scale more. Then what is needed is a distributed max instance count that all app instances adhere to. I have built this functionality into my Durable Function orchestration app with a scaleGroupId and it`s max instance count. It has an Api call to save this info and the scaleGroupId is a string that can be set to anything that describes the resource you want to protect from overloading. Here is my app that can do this:
Microflow

Run Web Job in parallel

We have a series of 4 Service Bus queues, each queue has a web job that processes messages and passes it on to the next queue. Though we're running on a single core, each webjob is async and allows the other jobs to continue while it queries a database or endpoint.
we have set MaxConcurrentCalls = 3 in the ServiceBusConfiguration
However, now all the messages are in the final queue, it's not spinning up multiple instances of the final Web Job to process them faster and instead executing synchronously. I'd like to know how to configure my Web Jobs to run the same web job in parallel.
I notice this article from 2014 which suggests we have to implement our own parallel processing but more recent articles contradict this information saying it is supported OOTB.

Only for Continuous WebJobs is available to scale out multi instances.
Which is determining whether the program or script runs on all instances or just one instance.
The option to run on multiple instances doesn't apply to the free or shared price tiers.
In your webjob, you will find an instance of the JobHostConfiguration object. This object is used to configure the properties of your webjob.
Here is a configuration:
static void Main()
{
var config = new JobHostConfiguration();
config.UseTimers();
config.Queues.MaxDequeueCount = 2;
config.Queues.MaxPollingInterval = TimeSpan.FromSeconds(4);
config.Queues.BatchSize = 2;
var host = new JobHost(config);
host.RunAndBlock();
}
So lets break down the items into pieces:
config.UseTimers();
The config.UserTimers(); allows us to use a timer trigger in our functions.
config.Queues.MaxDequeueCount = 2;
The MaxDequeueCount is the number of times your function will try process a message if it errors out.
config.Queues.MaxPollingInterval = TimeSpan.FromSeconds(4);
MaxPollingInterval is the max amount of time the WebJob will check the queue.
If of this is not desirable you can change this setting like I have above so that the WebJob will check the queue maximum every 4 seconds.
config.Queues.BatchSize = 2;
The BatchSize property is the amount of items your WebJob will process at the same time. The items will be processed asynchronously.
So if there is 2 items in the queue they will be processed parallel. If you set this one to 1 then you are creating a Synchronous flow as it will only take one item out of the queue at a time.
For more detail, you could refer to this article to learn run webjob in parallel.
Update:
The method BeginReceiveBatch/EndReceiveBatch allows you to retrieve multiple "items" from Queue (Async) and then use AsParallel to convert the IEnumerable returned by the previous methods and process the messages in multiple threads.
var messages = await Task.Factory.FromAsync<IEnumerable<BrokeredMessage>>(Client.BeginReceiveBatch(3, null, null), Client.EndReceiveBatch);
messages.AsParallel().WithDegreeOfParallelism(3).ForAll(item =>
{
ProcessMessage(item);
});
That code retrieves 3 messages from queue and processes then in "3 threads" (Note: it is not guaranteed that it will use 3 threads, .NET will analyze the system resources and it will use up to 3 threads if necessary.)
For more details, you could refer to this case.

By setting ServiceBusConfiguration.PrefetchCount and ServiceBusConfiguration.MessageOptions.MaxConcurrentCalls, I have been able to see that a single webjob will dequeue multiple messages and process them in parallel.

How to make azure function to run continuously as it gets stopped after 5 to 6 mins?

public static void Run(string input, TraceWriter log)
{
log.Info("SimpleProducer");
KafkaOptions options = new KafkaOptions(new Uri("http://*******:9092"));
BrokerRouter router = new BrokerRouter(options);
Producer client = new Producer(router);
while(true)
{
JObject obj = JObject.FromObject(new
{
ExchangeName = "BitFinex",
CurrencyPair = "Dollar",
MachineTime = DateTime.Now.ToString("dd-MM-yyyy_HH:mm:ss.ffffff"),
OrderSide = "Buy",
OrderId = "123",
Price = "10",
Quantity = "100"
});
log.Info(obj.ToString(Formatting.None));
client.SendMessageAsync("tenant", new[] { new Message(obj.ToString(Formatting.None)) }).Wait();
log.Info("Next Iteration");
}
}
I used while loop to process data continuously in azure azure function.But the azure function gets stopped after 5 to 6 mins and I once again have to rerun the azure function. Is there any settings to run azure functions continuously?.I have used the above code.

No, you should use WebJobs for that. Azure Functions are capped at 5 minutes of runtime.
Here's the article to get you started on WebJobs:
https://learn.microsoft.com/en-us/azure/app-service-web/web-sites-create-web-jobs
They (Functions and WebJobs) are basically the same thing (made from the same SDK), so porting the code would be trivial.

You may use Azure Functions to execute long running jobs by creating them under the App Service Plan with AlwaysOn enabled. This option gives you dedicated infrastructure that is always running and will not have the current 5-minute execution time limit.

As they mentioned before
- create the function in a dedicated Service Plan and not as a "consumption/pay as you go".
Important related information:
- check durable functions that will allow you to have your funcion in singleton and flexible intervals, and the "monitor pattern"
- on durable functions, check eternal orchestrator
- you can also mark your function to be started/triggered manually

If you decide to take the Azure Function approach you better use this custom binding/trigger for Kafka

Setup webjob ServiceBusTriggers or queue names at runtime (without hard-coded attributes)?

Is there any way to configure triggers without attributes? I cannot know the queue names ahead of time.
Let me explain my scenario here.. I have one service bus queue, and for various reasons (complicated duplicate-suppression business logic), the queue messages have to be processed one at a time, so I have ServiceBusConfiguration.OnMessageOptions.MaxConcurrentCalls set to 1. So processing a message holds up the whole queue until it is finished. Needless to say, this is suboptimal.
This 'one at a time' policy isn't so simple. The messages could be processed in parallel, they just have to be divided into groups (based on a field in message), say A and B. Group A can process its messages one at a time, and group B can process its own one at a time, etc. A and B are processed in parallel, all is good.
So I can create a queue for each group, A, B, C, ... etc. There are about 50 groups, so 50 queues.
I can create a queue for each, but how to make this work with the Azure Webjobs SDK? I don't want to copy-paste a method for each queue with a different ServiceBusTrigger for the SDK to discover, just to enforce one-at-a-time per queue/group, then update the code with another copy-paste whenever another group is needed. Fetching a list of queues at startup and tying to the function is preferable.
I have looked around and I don't see any way to do what I want. The ITypeLocator interface is pretty hard-set to look for attributes. I could probably abuse the INameResolver, but it seems like I'd still have to have a bunch of near-duplicate methods around. Could I somehow create what the SDK is looking for at startup/runtime?
(To be clear, I know how to use INameResolver to get queue name as at How to set Azure WebJob queue name at runtime? but though similar this isn't my problem. I want to setup triggers for multiple queues at startup for the same function to get the one-at-a-time per queue processing, without using the trigger attribute 50 times repeatedly. I figured I'd ask again since the SDK repo is fairly active and it's been a year..).
Or am I going about this all wrong? Being dumb? Missing something? Any advice on this dilemma would be welcome.

The Azure Webjob Host discovers and indexes the functions with the ServiceBusTrigger attribute when it starts. So there is no way to set up the queues to trigger at the runtime.
The simpler solution for you is to create a long time running job and implement it manually:
public class Program
{
private static void Main()
{
var host = new JobHost();
host.CallAsync(typeof(Program).GetMethod("Process"));
host.RunAndBlock();
}
[NoAutomaticTriggerAttribute]
public static async Task Process(TextWriter log, CancellationToken token)
{
var connectionString = "myconnectionstring";
// You can also get the queue name from app settings or azure table ??
var queueNames = new[] {"queueA", "queueA" };
var messagingFactory = MessagingFactory.CreateFromConnectionString(connectionString);
foreach (var queueName in queueNames)
{
var receiver = messagingFactory.CreateMessageReceiver(queueName);
receiver.OnMessage(message =>
{
try
{
// do something
....
// Complete the message
message.Complete();
}
catch (Exception ex)
{
// Log the error
log.WriteLine(ex.ToString());
// Abandon the message so that it can be retry.
message.Abandon();
}
}, new OnMessageOptions() { MaxConcurrentCalls = 1});
}
// await until the job stop or restart
await Task.Delay(Timeout.InfiniteTimeSpan, token);
}
}
Otherwise, if you don't want to deal with multiple queues, you can have a look at azure servicebus topic/subscription and create SqlFilter to send your message to the right subscription.
Another option could be to create your own trigger: The azure webjob SDK provides extensibility points to create your own trigger binding :
Binding Extensions Overview
Good Luck !

Based on my understanding, your needs seems to be building a message batch system in parallel. The #Thomas solution is good, but I think Azure Batch service with Table storage may be better and could be instead of the complex solution of ServiceBus queue + WebJobs with a trigger.
Using Azure Batch with Table storage, you can control the task creation and execute the task in parallel and at scale, even monitor these tasks, please refer to the tutorial to know how to.

SharePoint TimerJobs and threading

I've written a SharePoint 2010 app that uses a TimerJob to trigger processing of some documents in a list. The timer is set to trigger every minute but the the processing may take more than a minute. I'm just wondering if the next trigger of the timer job will be started using a new thread, or will the timer service just wait until the first thread has completed. I have no idea how Sharepoint manages threads for TimerJobs and I can't really find any relevant information.
This is possibly a problem given that my TimerJob definition has the following:
public override void Execute(Guid contentDbId)
{
try
{
SPWebApplication webApplication = this.Parent as SPWebApplication;
SPContentDatabase contentDb = webApplication.ContentDatabases[contentDbId];
using (SPSite site = contentDb.Sites[0])
{
using (SPWeb web = site.RootWeb)
{
PRManager.TriggerProcessing(web); // ?
}
}
}
catch (Exception)
{
}
}
}
The PRManager.TriggerProcessing() is a static method, obviously, and while it does contain mechanisms to limit only one thread at a time entering the method body, I'm just wondering IF SharePoint does create multiple threads in the event that those at-minute-interval calls to execute overlap.

Well, it is not so much a "thread" thing as much as it is a "job" thing.
SharePoint stores all jobs in a database table and it uses this table to track what is running and where it is running. It has a built in synchronization engine that is responsible for making sure the jobs execute as the job instructions say.
For example take the deployment ask
which is nothing more than a job. The
deployment task only allows One job to
run for a given solution at a time.
It makes sure that all of the tasks
finish on each server in the farm
before the overall job is reported as
done.
So the answer will depend on how your job configuration properties are set. There is a property on that job that tells SharePoint to only allow one instance of that job to run at a time. So if the job is currently executing another instance of it will not be started.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string