ScriptHost error occured despite catching an exception - azure

I have a simple function that takes a message from a queue and saves it to a storage table. I expect that in some cases a table entity with the same data can already exist. Because of that, I added an exception handling to skip this type of situation and mark the queue message as processed. Despite the fact that exception is handled now, the scripthost informs me about an error and the message is still in the queue.
I suppose it is caused by the fact that I'm using table binding that is on edge between host and my code. Am I right? Should I use a table client within my code instead of binding? Is there a different approach?
Sample code to generate this situation:
[FunctionName("MyFunction")]
public static async Task Run([QueueTrigger("myqueue", Connection = "Conn")]string msg, [Table("mytable", Connection = "Conn")] IAsyncCollector<DataEntity> dataEntity, TraceWriter log)
{
try
{
await dataEntity.AddAsync(new DataEntity()
{
PartitionKey = "1",
RowKey = "1",
Data = msg
});
await dataEntity.FlushAsync();
}
catch (StorageException e)
{
// when it is an exception that informs "entity already exists" skip it
}
}

When a queue trigger function fails, Azure Functions retries the function up to five times for a given queue message, including the first try.
If all five attempts fail, the functions runtime adds a message to a queue named <originalqueuename>-poison.
You can write a function to process messages from the poison queue by logging them or sending a notification that manual attention is needed.
The host.json file contains settings that control queue trigger behavior:
{
"queues": {
"maxPollingInterval": 2000,
"visibilityTimeout" : "00:00:30",
"batchSize": 16,
"maxDequeueCount": 1,
"newBatchThreshold": 8
}
}
Note: maxDequeueCount default is 5. The number of times to try processing a message before moving it to the poison queue. For your need, you could set the "maxDequeueCount":1.
Also these settings are host wide and apply to all functions. You can't control these per function currently.

Related

Queue triggered based function app not getting completed

We are using queue trigger based function app on premium plan where messages contains some details like azure subscriptions name. Based on which for each subscription we do many api calls specially to azure storage accounts(around 400 to 500). Since 'list' api call to storage account is limited to 100 call/5min, we get 429 response error on 101th call. To mitigate this we have applied exponential retry logic(tried both our own or polly library) which call after certain delay of time. This works for some subscription but fails for many where the retry logic does not try after first trying(we kept 3 retries with 60 sec delay). Even while monitoring the function app through live metrics we observed that sometimes cpu usage of some function instance goes to zero(although we do some operation like logging or use for loop in delay operation so that the function can be alive) which leads to killing of that particular function instance and pushing the message back to queue and start the process again with a fresh instance.
Note that since many subscription are processed in parallel, function app automatically scale up as required. Also since we are using premium plan one VM is always on state. So killing of any instance(which call around 400 to 500 storage api call for any particular subscription) is weird since in our delay the thread sleep time is only 10 sec for around 6,12,18(Time_delay) iteration. The below delay function is used in our retry logic code.
private void Delay(int Time_delay, string requestUri, int retryCount)
{
for (int i = 0; i < Time_delay; i++)
{
_logger.LogWarning($"Sleep initiated for id: {requestUri.ToString()}, RetryCount: {retryCount} CurrentTimeDelay: {Time_delay}");
Thread.Sleep(10000);
_logger.LogWarning($"Sleep completed for id: {requestUri.ToString()}, RetryCount: {retryCount} CurrentTimeDelay: {Time_delay}");
}
}
Note** Function app is not throwing any other exception other than dependency of 429 error response.
Would it be possible for you to requeue instead of using Thread.Sleep? You can use initial visibility delay when requeuing:
public class Function1
{
[FunctionName(nameof(TryDoWork))]
public static async Task TryDoWork(
[QueueTrigger("some-queue")] SomeItem item,
[Queue("some-queue")] CloudQueue queue)
{
var result = _SomeService.SomeWork(item);
if (result == 429)
{
item.Retries++;
var json = JsonConvert.SerializeObject(item);
var message = new CloudQueueMessage(json);
var delay = TimeSpan.FromSeconds(item.Retries);
await queue.AddMessageAsync(message, null, delay, null, null);
}
}
}
It might be that the sleeping is causing some wonky function app behavior. I think I remember reading some issues pertaining to the usage of Thread.Sleep, but I can't find it right now.
Also, you might want to add some sort of handling of messages that end up retrying more than 3 times (or however many you think is reasonable).

Akka.net Ask timeout when used in Azure WebJob

At work we have some code in a Azure WebJob where we use Rabbit
The basic workflow is this
A message arrives on RabbitMQ Queue
We have a message handler for the incoming message
Within the message handler we start a top level (user) supervisor actor where we "ask" it to handle the message
The supervisor actor hierarchy is like this
And the relevant top level code is something like this (this is the WebJob code)
static void Main(string[] args)
{
try
{
//Bootstrap akka IoC resolver well ahead of any actor usages
new AutoFacDependencyResolver(ContainerOperations.Instance.Container, ContainerOperations.Instance.Container.Resolve<ActorSystem>());
var system = ContainerOperations.Instance.Container.Resolve<ActorSystem>();
var busQueueReader = ContainerOperations.Instance.Container.Resolve<IBusQueueReader>();
var dateTime = ContainerOperations.Instance.Container.Resolve<IDateTime>();
busQueueReader.AddHandler<ProgramCalculationMessage>("RabbitQueue", x =>
{
//This is code that gets called whenever we have a RabbitMQ message arrive
//This is code that gets called whenever we have a RabbitMQ message arrive
//This is code that gets called whenever we have a RabbitMQ message arrive
//This is code that gets called whenever we have a RabbitMQ message arrive
//This is code that gets called whenever we have a RabbitMQ message arrive
try
{
//SupervisorActor is a singleton
var supervisorActor = ContainerOperations.Instance.Container.ResolveNamed<IActorRef>("SupervisorActor");
var actorMessage = new SomeActorMessage();
var supervisorRunTask = runModelSupervisorActor.Ask(actorMessage, TimeSpan.FromMinutes(25));
//we want to wait this guy out
var supervisorRunResult = supervisorRunTask.GetAwaiter().GetResult();
switch (supervisorRunResult)
{
case CompletedEvent completed:
{
break;
}
case FailedEvent failed:
{
throw failed.Exception;
}
}
}
catch (Exception ex)
{
_log.Error(ex, "Error found in Webjob");
//throw it for the actual RabbitMqQueueReader Handler so message gets NACK
throw;
}
});
Thread.Sleep(Timeout.Infinite);
}
catch (Exception ex)
{
_log.Error(ex, "Error found");
throw;
}
}
And this is the relevant IOC code (we are using Autofac + Akka.NET DI for Autofac)
builder.RegisterType<SupervisorActor>();
_actorSystem = new Lazy<ActorSystem>(() =>
{
var akkaconf = ActorUtil.LoadConfig(_akkaConfigPath).WithFallback(ConfigurationFactory.Default());
return ActorSystem.Create("WebJobSystem", akkaconf);
});
builder.Register<ActorSystem>(cont => _actorSystem.Value);
builder.Register(cont =>
{
var system = cont.Resolve<ActorSystem>();
return system.ActorOf(system.DI().Props<SupervisorActor>(),"SupervisorActor");
})
.SingleInstance()
.Named<IActorRef>("SupervisorActor");
The problem
So the code is working fine and doing what we want it to, apart from the Akka.Net "ask" timeout shown above in the WebJob code.
Annoyingly this seems to work fine if I try and run the webjob locally. Where I can simulate a "ask" timeout by providing a new supervisorActor that simply doesn't EVER respond with a message back to the "Sender".
This works perfectly running on my machine, but when we run this code in Azure, we DO NOT see a Timeout for the "ask" even though one of our workflow runs exceeded the "ask" timeout by a mile.
I just don't know what could be causing this behavior, does anyone have any ideas?
Could there be some Azure specific config value for the WebJob that I need to set.
The answer to this was to use the async rabbit handlers which apparently came out in V5.0 of the C# rabbit client. The offical docs still show the sync usage (sadly).
This article is quite good : https://gigi.nullneuron.net/gigilabs/asynchronous-rabbitmq-consumers-in-net/
Once we did this, all was good

Durable Task Framework re-queue failed task

How to use "waiting for external" event functionality of durable task framework in the code. Following is a sample code.
context.ScheduleWithRetry<LicenseActivityResponse>(
typeof(LicensesCreatorActivity),
_retryOptions,
input);
I am using ScheduleWithRetry<> method of context for scheduling my task on DTF but when there is an exception occurring in the code. The above method retries for the _retryOptions number of times.
After completing the retries, the Orchestration status will be marked as Failed.
I need a process by which i can resume my orchestration on DTF after correcting the reason of exception.
I am looking into the githib code for the concerned method in the code but no success.
I have concluded two solution:
Call a framework's method (if exist) and re-queue the orchestration from the state where it failed.
Hold the orchestration code in try catch and in catch section i implement a method CreateOrchestrationInstanceWithRaisedEventAsync whcih will put the orchestration in hold state until an external event triggers it back. Whenever a user (using some front end application) will call the external event for resuming (which means the user have made the corrections which were causing exception).
These are my understandings, if one of the above is possible then kindly guide me through some technical suggestions. otherwise find me a correct path for this task.
For the community's benefit, Salman resolved the issue by doing the following:
"I solved the problem by creating a sub orchestration in case of an exception occurs while performing an activity. The sub orchestration lock the event on azure as pending state and wait for an external event which raise the locked event so that the parent orchestration resumes the process on activity. This process helps if our orchestrations is about to fail on azure durable task framework"
I have figured out the solution for my problem by using "Signal Orchestrations" taken from code from GitHub repository.
Following is the solution diagram for the problem.
In this diagram, before the solution implemented, we only had "Process Activity" which actually executes the activity.
Azure Storage Table is for storing the multiplier values of an instanceId and ActivityName. Why we implemented this will get clear later.
Monitoring Website is the platform from where a user can re-queue/retry the orchestration activity to perform.
Now we have a pre-step and a post-step.
1. Get Retry Option (Pre-Step)
This method basically set the value of RetryOptions instance value.
private RetryOptions ModifyMaxRetires(OrchestrationContext context, string activityName)
{
var failedInstance =
_azureStorageFailedOrchestrationTasks.GetSingleEntity(context.OrchestrationInstance.InstanceId,
activityName);
var configuration = Container.GetInstance<IConfigurationManager>();
if (failedInstance.Result == null)
{
return new RetryOptions(TimeSpan.FromSeconds(configuration.OrderTaskFailureWaitInSeconds),
configuration.OrderTaskMaxRetries);
}
var multiplier = ((FailedOrchestrationEntity)failedInstance.Result).Multiplier;
return new RetryOptions(TimeSpan.FromSeconds(configuration.OrderTaskFailureWaitInSeconds),
configuration.OrderTaskMaxRetries * multiplier);
}
If we have any entry in our azure storage table against the instanceId and ActivityName, we takes the multiplier value from the table and updates the value of retry number in RetryOption instance creation. otherwise we are using the default number of retry value which is coming from our config.
Then:
We process the activity with scheduled retry number (if activity fails in any case).
2. Handle Exceptions (Post-Step)
This method basically handles the exception in case of the activity fails to complete even after the number of retry count set for the activity in RetryOption instance.
private async Task HandleExceptionForSignal(OrchestrationContext context, Exception exception, string activityName)
{
var failedInstance = _azureStorageFailedOrchestrationTasks.GetSingleEntity(context.OrchestrationInstance.InstanceId, activityName);
if (failedInstance.Result != null)
{
_azureStorageFailedOrchestrationTasks.UpdateSingleEntity(context.OrchestrationInstance.InstanceId, activityName, ((FailedOrchestrationEntity)failedInstance.Result).Multiplier + 1);
}
else
{
//const multiplier when first time exception occurs.
const int multiplier = 2;
_azureStorageFailedOrchestrationTasks.InsertActivity(new FailedOrchestrationEntity(context.OrchestrationInstance.InstanceId, activityName)
{
Multiplier = multiplier
});
}
var exceptionInput = new OrderExceptionContext
{
Exception = exception.ToString(),
Message = exception.Message
};
await context.CreateSubOrchestrationInstance<string>(typeof(ProcessFailedOrderOrchestration), $"{context.OrchestrationInstance.InstanceId}_{Guid.NewGuid()}", exceptionInput);
}
The above code first try to find the instanceID and ActivityName in azure storage. If it is not there then we simply add a new row in azure storage table for the InstanceId and ActivityName with the default multiplier value 2.
Later on we creates a new exception type instance for sending the exception message and details to sub-orchestration (which will be shown on monitoring website to a user). The sub-orchestration waits for the external event fired from a user against the InstanceId of the sub-orchestration.
Whenever it is fired from monitoring website, the sub-orchestration will end up and go back to start parent orchestration once again. But this time, when the Pre-Step activity will be called once again it will find the entry in azure storage table with a multiplier. Which means the retry options will get updated after multiplying it with default retry options.
So by this way, we can continue our orchestrations and prevent them from failing.
Following is the class of sub-orchestrations.
internal class ProcessFailedOrderOrchestration : TaskOrchestration<string, OrderExceptionContext>
{
private TaskCompletionSource<string> _resumeHandle;
public override async Task<string> RunTask(OrchestrationContext context, OrderExceptionContext input)
{
await WaitForSignal();
return "Completed";
}
private async Task<string> WaitForSignal()
{
_resumeHandle = new TaskCompletionSource<string>();
var data = await _resumeHandle.Task;
_resumeHandle = null;
return data;
}
public override void OnEvent(OrchestrationContext context, string name, string input)
{
_resumeHandle?.SetResult(input);
}
}

How to do Async in Azure WebJob function

I have an async method that gets api data from a server. When I run this code on my local machine, in a console app, it performs at high speed, pushing through a few hundred http calls in the async function per minute. When I put the same code to be triggered from an Azure WebJob queue message however, it seems to operate synchronously and my numbers crawl - I'm sure I am missing something simple in my approach - any assistance appreciated.
(1) .. WebJob function that listens for a message on queue and kicks off the api get process on message received:
public class Functions
{
// This function will get triggered/executed when a new message is written
// on an Azure Queue called queue.
public static async Task ProcessQueueMessage ([QueueTrigger("myqueue")] string message, TextWriter log)
{
var getAPIData = new GetData();
getAPIData.DoIt(message).Wait();
log.WriteLine("*** done: " + message);
}
}
(2) the class that outside azure works in async mode at speed...
class GetData
{
// wrapper that is called by the message function trigger
public async Task DoIt(string MessageFile)
{
await CallAPI(MessageFile);
}
public async Task<string> CallAPI(string MessageFile)
{
/// create a list of sample APIs to call...
var apiCallList = new List<string>();
apiCallList.Add("localhost/?q=1");
apiCallList.Add("localhost/?q=2");
apiCallList.Add("localhost/?q=3");
apiCallList.Add("localhost/?q=4");
apiCallList.Add("localhost/?q=5");
// setup httpclient
HttpClient client =
new HttpClient() { MaxResponseContentBufferSize = 10000000 };
var timeout = new TimeSpan(0, 5, 0); // 5 min timeout
client.Timeout = timeout;
// create a list of http api get Task...
IEnumerable<Task<string>> allResults = apiCallList.Select(str => ProcessURLPageAsync(str, client));
// wait for them all to complete, then move on...
await Task.WhenAll(allResults);
return allResults.ToString();
}
async Task<string> ProcessURLPageAsync(string APIAddressString, HttpClient client)
{
string page = "";
HttpResponseMessage resX;
try
{
// set the address to call
Uri URL = new Uri(APIAddressString);
// execute the call
resX = await client.GetAsync(URL);
page = await resX.Content.ReadAsStringAsync();
string rslt = page;
// do something with the api response data
}
catch (Exception ex)
{
// log error
}
return page;
}
}
First because your triggered function is async, you should use await rather than .Wait(). Wait will block the current thread.
public static async Task ProcessQueueMessage([QueueTrigger("myqueue")] string message, TextWriter log)
{
var getAPIData = new GetData();
await getAPIData.DoIt(message);
log.WriteLine("*** done: " + message);
}
Anyway you'll be able to find usefull information from the documentation
Parallel execution
If you have multiple functions listening on different queues, the SDK will call them in parallel when messages are received simultaneously.
The same is true when multiple messages are received for a single queue. By default, the SDK gets a batch of 16 queue messages at a time and executes the function that processes them in parallel. The batch size is configurable. When the number being processed gets down to half of the batch size, the SDK gets another batch and starts processing those messages. Therefore the maximum number of concurrent messages being processed per function is one and a half times the batch size. This limit applies separately to each function that has a QueueTrigger attribute.
Here is a sample code to configure the batch size:
var config = new JobHostConfiguration();
config.Queues.BatchSize = 50;
var host = new JobHost(config);
host.RunAndBlock();
However, it is not always a good option to have too many threads running at the same time and could lead to bad performance.
Another option is to scale out your webjob:
Multiple instances
if your web app runs on multiple instances, a continuous WebJob runs on each machine, and each machine will wait for triggers and attempt to run functions. The WebJobs SDK queue trigger automatically prevents a function from processing a queue message multiple times; functions do not have to be written to be idempotent. However, if you want to ensure that only one instance of a function runs even when there are multiple instances of the host web app, you can use the Singleton attribute.
Have a read of this Webjobs SDK documentation - the behaviour you should expect is that your process will run and process one message at a time, but will scale up if more instances are created (of your app service). If you had multiple queues, they will trigger in parallel.
In order to improve the performance, see the configurations settings section in the link I sent you, which refers to the number of messages that can be triggered in a batch.
If you want to process multiple messages in parallel though, and don't want to rely on instance scaling, then you need to use threading instead (async isn't about multi-threaded parallelism, but making more efficient use of the thread you're using). So your queue trigger function should read the message from the queue, the create a thread and "fire and forget" that thread, and then return from the trigger function. This will mark the message as processed, and allow the next message on the queue to be processed, even though in theory you're still processing the earlier one. Note you will need to include your own logic for error handling and ensuring that the data wont get lost if your thread throws an exception or can't process the message (eg. put it on a poison queue).
The other option is to not use the [queuetrigger] attribute, and use the Azure storage queues sdk API functions directly to connect and process the messages per your requirements.

How to parallelize an azure worker role?

I have got a Worker Role running in azure.
This worker processes a queue in which there are a large number of integers. For each integer I have to do processings quite long (from 1 second to 10 minutes according to the integer).
As this is quite time consuming, I would like to do these processings in parallel. Unfortunately, my parallelization seems to not be efficient when I test with a queue of 400 integers.
Here is my implementation :
public class WorkerRole : RoleEntryPoint {
private readonly CancellationTokenSource cancellationTokenSource = new CancellationTokenSource();
private readonly ManualResetEvent runCompleteEvent = new ManualResetEvent(false);
private readonly Manager _manager = Manager.Instance;
private static readonly LogManager logger = LogManager.Instance;
public override void Run() {
logger.Info("Worker is running");
try {
this.RunAsync(this.cancellationTokenSource.Token).Wait();
}
catch (Exception e) {
logger.Error(e, 0, "Error Run Worker: " + e);
}
finally {
this.runCompleteEvent.Set();
}
}
public override bool OnStart() {
bool result = base.OnStart();
logger.Info("Worker has been started");
return result;
}
public override void OnStop() {
logger.Info("Worker is stopping");
this.cancellationTokenSource.Cancel();
this.runCompleteEvent.WaitOne();
base.OnStop();
logger.Info("Worker has stopped");
}
private async Task RunAsync(CancellationToken cancellationToken) {
while (!cancellationToken.IsCancellationRequested) {
try {
_manager.ProcessQueue();
}
catch (Exception e) {
logger.Error(e, 0, "Error RunAsync Worker: " + e);
}
}
await Task.Delay(1000, cancellationToken);
}
}
}
And the implementation of the ProcessQueue:
public void ProcessQueue() {
try {
_queue.FetchAttributes();
int? cachedMessageCount = _queue.ApproximateMessageCount;
if (cachedMessageCount != null && cachedMessageCount > 0) {
var listEntries = new List<CloudQueueMessage>();
listEntries.AddRange(_queue.GetMessages(MAX_ENTRIES));
Parallel.ForEach(listEntries, ProcessEntry);
}
}
catch (Exception e) {
logger.Error(e, 0, "Error ProcessQueue: " + e);
}
}
And ProcessEntry
private void ProcessEntry(CloudQueueMessage entry) {
try {
int id = Convert.ToInt32(entry.AsString);
Service.GetData(id);
_queue.DeleteMessage(entry);
}
catch (Exception e) {
_queueError.AddMessage(entry);
_queue.DeleteMessage(entry);
logger.Error(e, 0, "Error ProcessEntry: " + e);
}
}
In the ProcessQueue function, I try with different values of MAX_ENTRIES: first =20 and then =2.
It seems to be slower with MAX_ENTRIES=20, but whatever the value of MAX_ENTRIES is, it seems quite slow.
My VM is a A2 medium.
I really don't know if I do the parallelization correctly ; maybe the problem comes from the worker itself (which may be it is hard to have this in parallel).
You haven't mentioned which Azure Messaging Queuing technology you are using, however for tasks where I want to process multiple messages in parallel I tend to use the Message Pump Pattern on Service Bus Queues and Subscriptions, leveraging the OnMessage() method available on both Service Bus Queue and Subscription Clients:
QueueClient OnMessage() - https://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.queueclient.onmessage.aspx
SubscriptionClient OnMessage() - https://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.subscriptionclient.onmessage.aspx
An overview of how this stuff works :-) - http://fabriccontroller.net/blog/posts/introducing-the-event-driven-message-programming-model-for-the-windows-azure-service-bus/
From MSDN:
When calling OnMessage(), the client starts an internal message pump
that constantly polls the queue or subscription. This message pump
consists of an infinite loop that issues a Receive() call. If the call
times out, it issues the next Receive() call.
This pattern allows you to use a delegate (or anonymous function in my preferred case) that handles the receipt of the Brokered Message instance on a separate thread on the WaWorkerHost process. In fact, to increase the level of throughput, you can specify the number of threads that the Message Pump should provide, thereby allowing you to receive and process 2, 4, 8 messages from the queue in parallel. You can additionally tell the Message Pump to automagically mark the message as complete when the delegate has successfully finished processing the message. Both the thread count and AutoComplete instructions are passed in the OnMessageOptions parameter on the overloaded method.
public override void Run()
{
var onMessageOptions = new OnMessageOptions()
{
AutoComplete = true, // Message-Pump will call Complete on messages after the callback has completed processing.
MaxConcurrentCalls = 2 // Max number of threads the Message-Pump can spawn to process messages.
};
sbQueueClient.OnMessage((brokeredMessage) =>
{
// Process the Brokered Message Instance here
}, onMessageOptions);
RunAsync(_cancellationTokenSource.Token).Wait();
}
You can still leverage the RunAsync() method to perform additional tasks on the main Worker Role thread if required.
Finally, I would also recommend that you look at scaling your Worker Role instances out to a minimum of 2 (for fault tolerance and redundancy) to increase your overall throughput. From what I have seen with multiple production deployments of this pattern, OnMessage() performs perfectly when multiple Worker Role Instances are running.
A few things to consider here:
Are your individual tasks CPU intensive? If so, parallelism may not help. However, if they are mostly waiting on data processing tasks to be processed by other resources, parallelizing is a good idea.
If parallelizing is a good idea, consider not using Parallel.ForEach for queue processing. Parallel.Foreach has two issues that prevent you from being very optimal:
The code will wait until all kicked off threads finish processing before moving on. So, if you have 5 threads that need 10 seconds each and 1 thread that needs 10 minutes, the overall processing time for Parallel.Foreach will be 10 minutes.
Even though you are assuming that all of the threads will start processing at the same time, Parallel.Foreach does not work this way. It looks at number of cores on your server and other parameters and generally only kicks off number of threads it thinks it can handle, without knowing too much about what's in those threads. So, if you have a lot of non-CPU bound threads that /can/ be kicked off at the same time without causing CPU over-utilization, default behaviour will not likely run them optimally.
How to do this optimally:
I am sure there are a ton of solutions out there, but for reference, the way we've architected it in CloudMonix (that must kick off hundreds of independent threads and complete them as fast as possible) is by using ThreadPool.QueueUserWorkItem and manually keeping track number of threads that are running.
Basically, we use a Thread-safe collection to keep track of running threads that are started by ThreadPool.QueueUserWorkItem. Once threads complete, remove them from that collection. The queue-monitoring loop is indendent of executing logic in that collection. Queue-monitoring logic gets messages from the queue if the processing collection is not full up to the limit that you find most optimal. If there is space in the collection, it tries to pickup more messages from the queue, adds them to the collection and kick-start them via ThreadPool.QueueUserWorkItem. When processing completes, it kicks off a delegate that cleans up thread from the collection.
Hope this helps and makes sense

Resources