ServiceBusTrigger Azure function throwing TaskCanceledException while waiting API call response - azure

We have a ServiceBusTrigger Azure Function. During the execution of an event, it will make an API call to different service. However, before the API call returns with result, if it takes longer time, the Azure Function would throw an exception with message like the following:
Exception while executing function: FunctionName One or more errors occurred. (A task was canceled.) A task was canceled.
From the multiple occurrences, this happened after the event is triggered by 100 seconds.
My question is: Is this due to timeout? If yes, why it's timing out/cancelling the task only after 100 seconds? Shouldn't the Azure Function default timeout be 5-minute?
Thanks for any answers in advance.

After more tests, it was determined the timeout/cancellation was due to the API call as a default timeout of 100 seconds. After changing to a longer timeout, the issue was resolved.

Related

AWS SQS FIFO message doesn't seem to be retrying

I've set up a function to hit an API endpoint for a newly created entity that isn't immediately available. If the endpoint returns a status of "pending", the function throws an error. If the endpoint returns a status of "active", the function then deletes the SQS message and triggers several other microservices to do their things using SNS. The SQS queue that triggers the function has a visibility timeout of 2 minutes, and the function itself has a 1 minute timeout.
What I'm expecting to happen is that if the endpoint returns a "pending" status, and the function throws an error, then after the 2 minute visibility timeout, the message would trigger the function again. This should happen every 2 minutes until the api call returns an "active" status and the message is deleted, or until the message retention period is surpassed (currently 1 hour). This seemed like a nice serverless way to poll my newly created entity to check if it was ready for other post-processing.
What's actually happening after adding a message to the SQS queue is that the CloudWatch logs are showing that the function is throwing an error like I'd expect, but the function is only being triggered one time. I can't tell if the message is just not not visible for some reason, or if it somehow was deleted. I don't know. I'm am new to using SQS for a Lambda trigger, am I thinking about this wrong?
A few possible causes here:
your Lambda function handler did not actually throw an exception to the Lambda runtime environment, so Lambda thought the function had successfully processed the message and the Lambda service then deleted the message from the queue (so that it would not get processed again)
your SQS queue has a configured DLQ with maximum receives set to 1, so the message is delivered once, the Lambda function fails, and the message is subsequently moved to the DLQ
the SQS message was re-delivered to the Lambda function and was logged but the logs were made to an earlier log stream (because this invocation was warm) and so it wasn't obvious that the Lambda function had actually been invoked multiple times with the same failed message
To verify this all works normally, I set up a simple test with both FIFO and non-FIFO queues and configured the queues to trigger a Lambda function that simply logged the SQS message and then threw an exception. As expected, I saw the same SQS message delivered to the Lambda function every 2 minutes (which is the queue's message visibility timeout). That continued until it hit the max receive count on the SQS redrive policy (defaults to 10 attempts) at which point the failed message was correctly moved to the associated DLQ.

Name or Service not known - intermittent error in Azure

I have a TimerTrigger which calls my own Azure Functions at a relatively high rate - a few times per second. It is not being stress tested. Every call takes just a 100ms and the purpose of the test is not a stress test.
This call to my own endpoint works about 9999 times out of 10000 but just once in a while I get the following error:
System.Net.Http.HttpRequestException: Name or service not known (app.mycustomdomain.com:443)
---> System.Net.Sockets.SocketException (0xFFFDFFFF): Name or service not known
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
I replaced my actual domain with "app.mycustomdomain.com" in the error message above. It is a custom domain set up to point to the Azure Function App using CNAME.
The Function App does not detect any downtime in the Azure Portal and I have Application Insights enabled and do not see any errors. So I assume the issue is somehow on the callers side and the call never actually happens.
What does this error indicate? And how can I alleviate the problem?
For your second question - alleviating the problem, one option would certainly be to build in retry using a library like Polly. High level you create a policy, e.g. for a simple retry:
var myPolicy = Policy
.Handle<SomeExceptionType>()
.Retry(3);
This would retry 3 times, to use the policy you can call a sync or async version of Execute:
await myPolicy.ExecuteAsync(async () =>
{
//do stuff that might fail up to three times
});
More complete samples are available
This library has lots of support for other approaches, e.g. with delays, exponential delays, etc.

Azure Function calls itself after 3 minutes

i have the following code in my azure function with 5 minutes manual timeout.
when i run the above function in azure, i see the function creates a new instance after 3 minutes.(check the below image)
both the instances completes successfully ,but returns Status: 504 Gateway Timeout which in turn fails my function execution.
i have hosted the function in App Service Plan, and also increased the timeout in host.json file to 10 minutes
"functionTimeout": "00:10:00"
Several questions in here:
Timeouts - The function timeout in host.json applies to the underlying function runtime; not the http pipeline. You should not have an http function running longer than a minute. The http calls will timeout independently of the runtime (as you see with the 504). However, you could use that timeout for a long-running (ie, 60 minute on appservice plan) queue trigger. If you need a long-running function, the http call could queue a message for a queue trigger, or you could use the Durable Function support.
Why is it invoking again? The simplest interpretation here is that your function is just receiving a second http request message. Do you have evidence that's not the case? You could bind to the HttpRequestMessage and log additional http request properties to track this down.

Azure Function on Always-On App Service Plan Times Out with No functionTimeout Set

Like the title describes - I have an Azure Function on the App Service Plan, configured for Always On and no functionTimeout set in my host.json, and it appears to timeout / not finish anytime after 30 minutes to 1 hour.(...but I feel this may be a false positive...)
The HTTP Triggered function can sometimes take over 1-2 hours to complete. I understand that this probably isn't the best design and according to the Azure Function Best Practices I should break this out into smaller / more manageable pieces - I get that. However, I expect the Function on the App Service plan to work as advertised - no hard limit on execution time. Perhaps this is the same question as Unexpected azure-function timeouts on app-service-plan, but that has no answer and I am using an HTTP Trigger instead.
Currently, the HTTP Triggered method does not return until the work is complete. (Is this a problem - the HTTP trigger needs to return quicker?)
According to the Kudu Function Invocation Logs, this case reports "Never Finished", and when I click on the Toggle Output button to view the logs, they never come in.
When I viewed this function's run in the Logs section of that trigger, it seems like the function just stopped, and the log stream just reports no new trace:
2017-07-26T16:36:43.116 [INFO] [Class1] Update operation started processing 790 sales records ...
2017-07-26T16:36:43.116 [DBUG] [Class2] Matching and updating ids from the map...
2017-07-26T16:38:07 No new trace in the past 1 min(s).
2017-07-26T16:39:07 No new trace in the past 2 min(s).
2017-07-26T16:40:07 No new trace in the past 3 min(s).
2017-07-26T16:41:07 No new trace in the past 4 min(s).
So not sure why this function just seemed to stop - or perhaps it stopped collecting log statements (there are many), and for some reason, the function never completed.
Any ideas?
Approx time: 2017-07-26T16:00:00 UTC
InvocationID: d856c107-f1ee-455a-892b-ed970dcad128 (I think?)
If it is indeed being timed out, is there any way for us to know, (Exception? App Insights? etc.)
Based on my test, I found azure function will not stop your function if you don't set the timeout.
Here is my test, I create a ManualTrigger function which will log the message every 10 minutes.
The codes like below:
public static void Run(string input, TraceWriter log)
{
for (int i = 0; i < 100; i++)
{
log.Info( "Worked " + i*10 + " minutes ");
Thread.Sleep(600000);
}
}
The log details:
In the log, you could find my function executed 70 minutes.It still works well.
The no trace means there are no new requests send to the azure function.
Currently, the HTTP Triggered method does not return until the work is complete. (Is this a problem - the HTTP trigger needs to return quicker?)
As Jesse Carter says, you couldn't execute long time function when you used HTTP Triggered method.
Since your client-side(send request) will have a timeout value. It will wait for the function's response.
Normally, if we want to execute long time function, I suggest you could use http trigger to get the request. In the http trigger function you could add a queue message to the azure storage queue.
Then you could write a queue trigger function which will execute the long time work.
If your HTTP method takes more than a minute, you should be offloading it to a Queue. Period. (I know the other answers have said this, but it's worth repeating).
Http connections are a limited resource.
While Azure Functions as an execution engine can handle long running
operations (as demonstrated by queue / service bus support), the
http pipeline may cut off / timeout long running requests.
Queue triggers can easily run for 30+ minutes. If your job is longer than that, you really should split it into multiple queue messages.
Also check out Durable Function support: https://github.com/Azure/azure-functions-durable-extension/
Regardless of the function app timeout setting, 230 seconds is the maximum amount of time that an HTTP triggered function can take to respond to a request. This is because of the default idle timeout of Azure Load Balancer. For longer processing times, consider using the Durable Functions async pattern or defer the actual work and return an immediate response.
Function app timeout duration: Check Notes

How to stop an Azure WebJobs queue message from being deleted from an Azure Queue?

I'm using Azure WebJobs to poll a queue and then process the message.
Part of the message processing includes a hit to 3rd party HTTP endpoint. (e.g. a Weather api or some Stock market api).
Now, if the hit to the api fails (network error, 500 error, whatever) I try/catch this in my code, log whatever and then ... what??
If I continue .. then I assume the message will be deleted by the WebJobs SDK.
How can I:
1) Say to the SDK - please don't delete this message (so it will be retried automatically at the next queue poll and when the message is visible again).
2) Set the invisibility time value, when the SDK pops a message off the queue for processing.
Thanks!
Now, if the hit to the api fails (network error, 500 error, whatever) I try/catch this in my code, log whatever and then ... what??
The Webjobs SDK behaves like this: If your method throws an uncaught exception, the message is returned to the Queue with its dequeueCount property +1. Else, if all is well, the message is considered successfully processed and is deleted from the Queue - i.e. queue.DeleteMessage(retrievedMessage);
So don't gracefully catch the HTTP 500, throw an exception so the SDK gets the hint.
If I continue .. then I assume the message will be deleted by the WebJobs SDK.
From https://github.com/Azure/azure-content/blob/master/articles/app-service-web/websites-dotnet-webjobs-sdk-get-started.md#contosoadswebjob---functionscs---generatethumbnail-method:
If the method fails before completing, the queue message is not deleted; after a 10-minute lease expires, the message is released to be picked up again and processed. This sequence won't be repeated indefinitely if a message always causes an exception. After 5 unsuccessful attempts to process a message, the message is moved to a queue named {queuename}-poison. The maximum number of attempts is configurable.
If you really dislike the hardcoded 10-minute visibility timeout (the time the message stays hidden from consumers), you can change it. See this answer by #mathewc:
From https://stackoverflow.com/a/34093943/4148708:
In the latest v1.1.0 release, you can now control the visibility timeout by registering your own custom QueueProcessor instances via JobHostConfiguration.Queues.QueueProcessorFactory. This allows you to control advanced message processing behavior globally or per queue/function.
https://github.com/Azure/azure-webjobs-sdk-samples/blob/master/BasicSamples/MiscOperations/CustomQueueProcessorFactory.cs#L63
protected override async Task ReleaseMessageAsync(CloudQueueMessage message, FunctionResult result, TimeSpan visibilityTimeout, CancellationToken cancellationToken)
{
// demonstrates how visibility timeout for failed messages can be customized
// the logic here could implement exponential backoff, etc.
visibilityTimeout = TimeSpan.FromSeconds(message.DequeueCount);
await base.ReleaseMessageAsync(message, result, visibilityTimeout, cancellationToken);
}

Resources