How to handle exceptions from webjobs in application insights? - azure

When an exception is thrown from webjob, it exits without logging to the application insights. Observed that flushing the logs to application insights takes few minutes, so we are missing the exceptions here. How to handle this?
Also, is there a way to move the message which hit the exception to poison queue automatically without manually inserting that message to poison queue?
I am using latest stable 3.x versions for the 2 NuGet packages:
Microsoft.Azure.WebJobs and Microsoft.Azure.WebJobs.Extensions
Created a host that implemented IHost as below:
var builder = new HostBuilder()
.UseEnvironment("Development")
.ConfigureWebJobs(b =>
{
...
})
.ConfigureLogging((context, b) =>
{
string appInsightsKey = context.Configuration["APPINSIGHTS_INSTRUMENTATIONKEY"];
if (!string.IsNullOrEmpty(appInsightsKey))
{
b.AddApplicationInsights(o => o.InstrumentationKey = appInsightsKey);
appInsights.TrackEvent("Application Insights is starting!!");
}
})
.ConfigureServices(services =>
{
….
})
.UseConsoleLifetime();
var host = builder.Build();
using (host)
{
host.RunAsync().Wait();
}
and Function.cs
public static async void ProcessQueueMessageAsync([QueueTrigger("queue")] Message message, int dequeueCount, IBinder binder, ILogger logger)
{
switch (message.Name)
{
case blah:
...break;
default:
logger.LogError("Invalid Message object in the queue.", message);
logger.LogWarning("Current dequeue count: " + dequeueCount);
throw new InvalidOperationException("Simulated Failure");
}
}
My questions here are:
1) When the default case is hit, webjob is terminating immediately and the loggers are not getting flushed into app insights even after waiting and starting the web job again. As it takes few minutes to reflect in app insights, and webjob stops, I am losing the error logs. How to handle this?
2) From the sample webjobs here, https://github.com/Azure/azure-webjobs-sdk-samples/blob/master/BasicSamples/QueueOperations/Functions.cs they are using JobHost host = new JobHost(); and if the 'FailAlways' function fails, it automatically retries for 5 times and pushed the message into poison queue. But this is not happening in my code. Is it because of different Hosts? or do I have to add any more configurations?

Try changing your function to return Task instead of void:
public static async Task ProcessQueueMessageAsync([QueueTrigger("queue")] Message message, int dequeueCount, IBinder binder, ILogger logger)
This worked for me where even though I was logging the error and throwing the exception, Application Insights would either show a successful invocation or no invocation occurring.

After inspecting the source code of the Application Insights SDK it became apparent that to get an Exception in Application Insights you must pass an exception object into the LogError call.
log.Error(ex, "my error message") - will result in Application Insight Exception
log.Error("my error message") - will result in Application Insight Trace.
is there a way to move the message which hit the exception to poison queue automatically without manually inserting that message to poison queue?
You could set config.Queues.MaxDequeueCount = 1; in webjob. The number of times to try processing a message before moving it to the poison queue.
And where is the MaxDequeueCount configuration should be added in the code?
You could set the property in JobHostConfiguration in program.cs

Related

Masstransit not creating Error queue for Azure Function event subscriber

We followed this example (http://masstransit-project.com/MassTransit/usage/azure-functions.html) to try to set up Azure Functions as Azure Service Bus event (topic) subscribers using MassTransit (for .Net CORE 2.1, Azure Functions 2.0).
When using Azure Webjobs this is as simple as using RabbitMQ, configure the publisher, let the subscriber configure and set up its queue, and have Masstransit automatically create one topic per event, redirect to queue and to "queue_error" after all retries have failed. You do not have to setup anything manually.
But with Azure Functions we seem to manually (through Service Bus Explorer or ARM templates) have to add the subscribers to the topic (which is created by the publisher on the first event it publishes) and the queues as well (though these don't even seem to be necessary, the events are handled directly by the consuming Azure Function topic subscribers.).
Maybe we are doing something wrong, I cannot see from the docs that MT will not, as it normally does, set up the subscriber andd creating queues when using Azure Functions. But it works, except for when the consumer throws an exception and after all setup retries have been executed. We simply do not get the event in the deadletter queue and the normally MT-generated error queue does not even get generated.
So how do we get MT to create the error queues, and MOVE the failed events there?
Our code:
[FunctionName("OrderShippedConsumer")]
public static Task OrderShippedConsumer(
[ServiceBusTrigger("xyz.events.order/iordershipped", "ordershippedconsumer-queue", Connection = "AzureServiceBus")] Message message,
IBinder binder,
ILogger logger,
CancellationToken cancellationToken,
ExecutionContext context)
{
var config = CreateConfig(context);
var handler = Bus.Factory.CreateBrokeredMessageReceiver(binder, cfg =>
{
var serviceBusEndpoint = Parse.ConnectionString(config["AzureServiceBus"])["Endpoint"];
cfg.CancellationToken = cancellationToken;
cfg.SetLog(logger);
cfg.InputAddress = new Uri($"{serviceBusEndpoint}{QueueName}");
cfg.UseRetry(x => x.Intervals(TimeSpan.FromSeconds(5)));
cfg.Consumer(() => new OrderShippedConsumer(cfg.Log, config));
});
return handler.Handle(message);
}
And the Consumer code:
public OrderShippedConsumer(ILog log, IConfigurationRoot config)
{
this.config = config;
this.log = log;
}
public async Task Consume(ConsumeContext<IOrderShipped> context)
{
// Handle the event
}
}

TaskCanceledException on azure function (Service bus trigger)

I have a Service Bus Trigger Azure function, which is triggered every time a topic receives a message.
Messages arrive at regular intervals, for example every 30 minutes. Between lots, no activity.
The function does nothing special, it does an asynchronous posting of the message via HttpClient. The function is regularly stopped with a TaskCanceledException.
The HttpClient is static
public static class SampleEventTrigger
{
private static DefaultHttpWebHook webHook = new DefaultHttpWebHook(new Uri("https://nonexistent.invalid/sampleWebHook"), "/event/sampleEvent");
[FunctionName("SampleEventTrigger")]
public static async Task Run(
[ServiceBusTrigger("sampleevent", "SampleEvent.Subs", AccessRights.Manage, Connection = GlobalConfiguration.ServiceBusConnection)]BrokeredMessage message,
TraceWriter log)
{
log.Info("launch sample event subscription");
try
{
var resp = await webHook.Post(message, log);
log.Info($"{resp.StatusCode}, {resp.ReasonPhrase}");
}
catch (Exception ex)
{
log.Error($"exception in webhook: {ex.Message}", ex);
throw;
}
}
}
If I raise it again just after, this time it passes.
Where does this exception come from? How do we avoid that?
Is it related to a timeout, or to launching the function that would be too slow?
My function is in Consumption mode.
Chances are that your Http call is timing out. Awaited Http calls that time out throw TaskCanceledException . I'm not sure what your DefaultHttpWebHook class does under the covers, but it should be using PostAsync in the Post method (which itself should have the Async suffix).
To verify you could catch TaskCanceledException and examine the inner exception. If you are still struggling, convert your code to non-async during local development to get a better handle on what's happening - it'll give you back a true exception rather than bubbling it up as a TCE.

Azure web job failing to execute after timeout

some of my continuous running web job function(random) show message of Timeout value of 00:30:00 exceeded by function '<myfunction>' (Id: '<id>'). Initiating cancellation.
after this message this function will not execute itself until and unless manually stop and start the azure web job.
Thanks in advance.
some of my continuous running web job function(random) show message of Timeout value of 00:30:00 exceeded by function '<myfunction>' (Id: '<id>'). Initiating cancellation.
Based on your error, I found the related code from Microsoft.Azure.WebJobs.Host under FunctionExecutor.cs as follows:
internal static void OnFunctionTimeout(System.Timers.Timer timer, FunctionDescriptor method, Guid instanceId, TimeSpan timeout, bool timeoutWhileDebugging,
TraceWriter trace, ILogger logger, CancellationTokenSource cancellationTokenSource, Func<bool> isDebuggerAttached)
{
timer.Stop();
bool shouldTimeout = timeoutWhileDebugging || !isDebuggerAttached();
string message = string.Format(CultureInfo.InvariantCulture,
"Timeout value of {0} exceeded by function '{1}' (Id: '{2}'). {3}",
timeout.ToString(), method.ShortName, instanceId,
shouldTimeout ? "Initiating cancellation." : "Function will not be cancelled while debugging.");
trace.Error(message, null, TraceSource.Execution);
logger?.LogError(message);
trace.Flush();
// Only cancel the token if not debugging
if (shouldTimeout)
{
// only cancel the token AFTER we've logged our error, since
// the Dashboard function output is also tied to this cancellation
// token and we don't want to dispose the logger prematurely.
cancellationTokenSource.Cancel();
}
}
I assumed that you specified the TimeoutAttribute for your function as follows:
I would recommend you could use a CancellationToken parameter in your function and it would be canceled whenever a timeout occurs or host shutdown, and you could exit your function gracefully as follows:

Queue messages that are moved to Poison Queue still show as queue count, but stay hidden

I am testing the Poison message handling of the Webjob that I am building.
Everything seems to be working as expected except, one strange thing:
When a message is moved to the “-poison” queue, its ghost seems to remain hidden (invisible) in the main job queue. That means if I have 6 poison messages moved to the “-poison” queue, storage explorer shows “Showing 0 of 6 messages in queue”. I can not see the 6 hidden messages in the Storage Explorer.
I tried to delete the job queue and recreating it, but the strange issue still happening after I run my tests. Storage explorer shows “Showing 0 of 6 messages in queue”.
What is happening behind the scene?
Update 1
I did some investigation and I think WebJob SDK does not delete the poison message.
I went through WebJob SDK source code and I think this line of code is not being executed for some reason:
https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Queues/QueueProcessor.cs#L119
Here is my Function that can help reproducing the issue:
public class Functions
{
public static void ProcessQueueMessage([QueueTrigger("%QueueName%")] string message, TextWriter log)
{
if (message.Contains("Break"))
{
throw new Exception($"Error while processing message {message}");
}
log.WriteLine($"Processed message {message}");
}
}
Update 2
Here is the WebJob SDK I am using:
As far as I know, the azure storage SDK 8.+ is not work well with the Azure webjobs SDK2.0 (related issue).
If you use storage SDK 8.+ the poison messages stay undeleted-but-invisible.
Workaround method is using the low azure storage SDK 7.2.1.
It will work well.
And this issue will be solved in the future SDK version.
I have the same problem.
The problem is when then Message copy in poison queue pass by ref without visibility time https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Queues/QueueProcessor.cs#L145 and when try to delete the message from original queue the service returns 404 not found. Is a problem in azure-webjobs-sdk and the solution is to make this change
await AddMessageAndCreateIfNotExistsAsync(poisonQueue, new CloudQueueMessage(message.AsString), cancellationToken);
in https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Queues/QueueProcessor.cs#L145
we wait new version with this fix
Custom solution
To solve this create your own CustomProcessor and in CopyMessageToPoisonQueueAsync function create new CloudMessage from original to pass in poison queue, see example below.
var config = new JobHostConfiguration
config.Queues.QueueProcessorFactory = new CustomQueueProcessorFactory();
public QueueProcessor Create(QueueProcessorFactoryContext context)
{
// demonstrates how the Queue.ServiceClient options can be configured
context.Queue.ServiceClient.DefaultRequestOptions.ServerTimeout = TimeSpan.FromSeconds(30);
// demonstrates how queue options can be customized
context.Queue.EncodeMessage = true;
// return the custom queue processor
return new CustomQueueProcessor(context);
}
/// <summary>
/// Custom QueueProcessor demonstrating some of the virtuals that can be overridden
/// to customize queue processing.
/// </summary>
private class CustomQueueProcessor : QueueProcessor
{
private QueueProcessorFactoryContext _context;
public CustomQueueProcessor(QueueProcessorFactoryContext context)
: base(context)
{
_context = context;
}
public override async Task CompleteProcessingMessageAsync(CloudQueueMessage message, FunctionResult result, CancellationToken cancellationToken)
{
await base.CompleteProcessingMessageAsync(message, result, cancellationToken);
}
protected override async Task CopyMessageToPoisonQueueAsync(CloudQueueMessage message, CloudQueue poisonQueue, CancellationToken cancellationToken)
{
var msg = new CloudQueueMessage(message.AsString);
await base.CopyMessageToPoisonQueueAsync(msg, poisonQueue, cancellationToken);
}
protected override void OnMessageAddedToPoisonQueue(PoisonMessageEventArgs e)
{
base.OnMessageAddedToPoisonQueue(e);
}
}
For anyone out there still having this issue. This should be fixed since 2.1.0-beta1-10851. The downside is that there is currently no stable released version of 2.1.0 yet.

How to Nak a ServiceStack RabbitMQ message within the RegisterHandler?

I'd like to be able to requeue a message from within my Service Endpoint that has been wired up through the RegisterHandler method of RabbitMQ Server. e.g.
mqServer.RegisterHandler<OutboundILeadPhone>(m =>
{
var db = container.Resolve<IFrontEndRepository>();
db.SaveMessage(m as Message);
return ServiceController.ExecuteMessage(m);
}, noOfThreads: 1);
or here.
public object Post(OutboundILeadPhone request)
{
throw new OutBoundAgentNotFoundException(); // added after mythz posted his first response
}
I don't see any examples how this is accomplished, so I'm starting to believe that it may not be possible with the ServiceStack abstraction. On the other hand, this looks promising.
Thank you, Stephen
Update
Throwing an exception in the Service does nak it, but then the message is sent to the OutboundILeadPhone.dlq which is normal ServiceStack behavior. Guess what I'm looking for is a way for the message to stay in the OutboundILeadPhone.inq queue.
Throwing an exception in your Service will automatically Nak the message. This default exception handling behavior can also be overridden with RabbitMqServer's RegisterHandler API that takes an Exception callback, i.e:
void RegisterHandler<T>(
Func<IMessage<T>, object> processMessageFn,
Action<IMessage<T>, Exception> processExceptionEx);
void RegisterHandler<T>(
Func<IMessage<T>, object> processMessageFn,
Action<IMessage<T>, Exception> processExceptionEx,
int noOfThreads)

Resources