Azure Scheduled WebJob Never Finished Status

Azure Scheduled WebJob Never Finished Status - azure

I have a scheduled web job that runs a function every minute:
[TimerTrigger("00:01:00", RunOnStartup = true)]
Sometimes it hangs and has a "Never Finish" status for a few days. This prevents new schedules for the job to be triggered. The Azure log also didn't record any entries - log was empty for that run.
I wonder if there is a way to tell Azure scheduler to continue with the scheduling if the job has a "Never Finish" status / state? Does setting "UseMonitor = true" do this?

As far as I know, if the scheduled web job processing is taking a long time periodically (or not finishing at all), it must be that every now and then the operations in your job function take a long time or fail. All depends on what your job is actually doing internally. If it is going async in places, the SDK will continue to wait for it to return.
According to this, I suggest you could try to use webjob's TimeoutAttribute.
It easy for functions to be cancelled based on timeout if they're hung.It will show a exception.
If you find the error is too much and you want to alter, I suggest you could use ErrorTrigger, you could refer to this article.
More details, you could refer to below codes.
I used the queue to test it, the result is as same as TimerTrigger webjob.
//Change the function timeout value
[Timeout("00:00:03")]
public static void TimeoutJob(
[QueueTrigger("queue")] string message,
CancellationToken token,
TextWriter log)
{
Console.WriteLine("From function: Received a message: " + message);
Task task = new Task(new Action(() => /*here is your code*/ Thread.Sleep(5000)), token);
// this will cancel the task is the token is CancellationRequested and show the exception the task is cancel
task.Wait();
Console.WriteLine("From function: Cancelled: Yes");
}

Related

How to re-queue message with updated information while working with Azure Service Bus Queue Function?

While working with Azure Service Bus Queue function, we know that whenever there is an exception, azure function will perform a default retry policy (max count = 10), what we would like to do is to have our message with a property called retryCount, so when exception generates, we would increase the retryCount += 1, and also add the current exception to our message, then next time while function performs the retry, we could know this is the xth time that it comes in along with x records of exceptions. We know that the Message object had a read-only property called deliveryCount, however, we cannot bind our addition information or figure out what would be the reason of last failed delivery from the Message object.
However, after we tried to implement our idea, we found that whenever the function performs the retry, it always reload the initial message from the queue, not with our updated message. Is there any way to let it retry with updated message without force to re-send the updated back to the current queue?
In addition, how could we customize the current retry logic, for example, decrease the max retry count from 10 to 1 and use Polly to handle some scenario inside the function?

You don't really need a custom retryCount as the message already contains a system property called DeliveryCount that tracks the number of delivery (read processing) attempts. If you need to store some additional metadata between the retries, you would need to abandon your message. With Functions v2, to abandon a message you will need to use the message receiver used to receive the message.
public static async Task ProcessMessage([ServiceBusTrigger("myqueue")] string message,
int deliveryCount,
MessageReceiver messageReceiver,
string lockToken)
{
//
await messageReceiver.AbandonAsync(lockToken,
new Dictionary<string, object> { { "Reason", "Blah" });
}
Note that to ensure Azure Functions continues to process the message you will need to throw an exception. Otherwise, Functions by default assumes the message was processed successfully and will attempt to complete the message.

Twilio Taskrouter: How to prevent last worker in queue from being reassigned rejected task?

I'm using NodeJS to manage a Twilio Taskrouter workflow. My goal is to have a task assigned to an Idle worker in the main queue identified with queueSid, unless one of the following is true:
No workers in the queue are set to Idle
Reservations for the task have already been rejected by every worker in the queue
In these cases, the task should fall through to the next queue identified with automaticQueueSid. Here is how I construct the JSON for the workflow (it includes a filter such that an inbound call from an agent should not generate an outbound call to that same agent):
configurationJSON(){
var config={
"task_routing":{
"filters":[
{
"filter_friendly_name":"don't call self",
"expression":"1==1",
"targets":[
{
"queue":queueSid,
"expression":"(task.caller!=worker.contact_uri) and (worker.sid NOT IN task.rejectedWorkers)",
"skip_if": "workers.available == 0"
},
{
"queue":automaticQueueSid
}
]
}
],
"default_filter":{
"queue":queueSid
}
}
}
return config;
}
This results in no reservation being created after the task reaches the queue. My event logger shows that the following events have occurred:
workflow.target-matched
workflow.entered
task.created
That's as far as it gets and just hangs there. When I replace the line
"expression":"(task.caller!=worker.contact_uri) and (worker.sid NOT IN task.rejectedWorkers)"
with
"expression":"(task.caller!=worker.contact_uri)
Then the reservation is correctly created for the next available worker, or sent to automaticQueueSid if no workers are available when the call comes in, so I guess the skip_if is working correctly. So maybe there is something wrong with how I wrote the target expression?
I tried working around this by setting a worker to unavailable once they reject a reservation, as follows:
clientWorkspace
.workers(parameters.workerSid)
.reservations(parameters.reservationSid)
.update({
reservationStatus:'rejected'
})
.then(reservation=>{
//this function sets the worker's Activity to Offline
var updateResult=worker.updateWorkerFromSid(parameters.workerSid,process.env.TWILIO_OFFLINE_SID);
})
.catch(err=>console.log("/agent_rejects: error rejecting reservation: "+err));
But what seems to be happening is that as soon as the reservation is rejected, before worker.updateWorkerFromSid() is called, Taskrouter has already generated a new reservation and assigned it to that same worker, and my Activity update fails with the following error:
Error: Worker [workerSid] cannot have its activity updated while it has 1 pending reservations.
Eventually, it seems that the worker is naturally set to Offline and the task does time out and get moved into the next queue, as shown by the following events/descriptions:
worker.activity.update
Worker [friendly name] updated to Offline Activity
reservation.timeout
Reservation [sid] timed out
task-queue.moved
Task [sid] moved out of TaskQueue [friendly name]
task-queue.timeout
Task [sid] timed out of TaskQueue [friendly name]
After this point the task is moved into the next queue automaticQueueSid to be handled by available workers registered with that queue. I'm not sure why a timeout is being used, as I haven't included one in my workflow configuration.
I'm stumped--how can I get the task to successfully move to the next queue upon the last worker's reservation rejection?
UPDATE: although #philnash's answer helped me correctly handle the worker.sid NOT IN task.rejectedWorkers issue, I ultimately ended up implementing this feature using the RejectPendingReservations parameter when updating the worker's availability.

Twilio developer evangelist here.
rejectedWorkers is not an attribute that is automatically handled by TaskRouter. You reference this answer by my colleague Megan in which she says:
For example, you could update TaskAttributes to have a rejected worker SID list, and then in the workflow say that worker.sid NOT IN task.rejectedWorkerSids.
So, in order to filter by a rejectedWorkers attribute you need to maintain one yourself, by updating the task before you reject the reservation.
Let me know if that helps at all.

node-schedule undoing cancellation

I've been working with node for the first time in a while again and stumbled upon node-schedule, which for the most part has been a breeze, however, I've found resuming a scheduled task after canceling it via job.cancel() pretty difficult.
For the record, I'm using schedule to perform specific actions at a specific date (non-recurring) and under some circumstances cancel the task at a specific date but would later like to resume it.
I tried using job.cancel(true) after cancelling it via plain job.cancel() first as the documentation states that that would reschedule the task, but this has not worked for me. Using job.reschedule() after having cancelled job first yields the same result.
I could probably come up with an unelegant solution, but I thought I'd ask if anyone knows of an elegant one first.

It took me a while to understand node-schedule documentation ^^
To un-cancel a job, You have to give to reschedule some options.
If you don't pass anything to reschedule, this function returns false (Error occured)
For exemple, you can declare options, and pass this variable like this :
const schedule = require('node-schedule');
let options = {rule: '*/1 * * * * *'}; // Declare schedule rules
let job = schedule.scheduleJob(options, () => {
console.log('Job processing !');
});
job.cancel(); // Cancel Job
job.reschedule(options); // Reschedule Job
Hope it helps.

QueueTrigger Attribute Visibility Timeout

If I were to get a message from queue using Azure.Storage.Queue
queue.GetMessage(TimeSpan.FromMinutes(20));
I can set the visibility timeout, however when trying to use Azure.WebJobs (SDK 0.4.0-beta) attributes to auto bind a webjob to a queue
i.e.
public static void ProcessQueueMessage([QueueTrigger("myqueue")] string message){
//do something with queue item
}
Is there a way to set the visibility timeout on the attribute? There does not seem to be an option in JobHostConfiguration().Queues. If there is no way to override, is it the standard 30 seconds then?

In the latest v1.1.0 release, you can now control the visibility timeout by registering your own custom QueueProcessor instances via JobHostConfiguration.Queues.QueueProcessorFactory. This allows you to control advanced message processing behavior globally or per queue/function.
For example, to set the visibility for failed messages, you can override ReleaseMessageAsync as follows:
protected override async Task ReleaseMessageAsync(CloudQueueMessage message, FunctionResult result, TimeSpan visibilityTimeout, CancellationToken cancellationToken)
{
// demonstrates how visibility timeout for failed messages can be customized
// the logic here could implement exponential backoff, etc.
visibilityTimeout = TimeSpan.FromSeconds(message.DequeueCount);
await base.ReleaseMessageAsync(message, result, visibilityTimeout, cancellationToken);
}
More details can be found in the release notes here.

I have the same question and haven't found answer yet. But, to answer a part of your question, the default lease is 10 minutes.
Quoting the Azure Website: "When the method completes, the queue message is deleted. If the method fails before completing, the queue message is not deleted; after a 10-minute lease expires, the message is released to be picked up again and processed. This sequence won't be repeated indefinitely if a message always causes an exception. After 5 unsuccessful attempts to process a message, the message is moved to a queue named {queuename}-poison. The maximum number of attempts is configurable."
Link: http://azure.microsoft.com/en-us/documentation/articles/websites-dotnet-webjobs-sdk-get-started/
Section: ContosoAdsWebJob - Functions.cs - GenerateThumbnail method
Hope this helps!

Timer Job Scheduling on DataSheet Entry in SharePoint

I have a list on which I have an ItemUpdated handler.
When I edit using the datasheet view and modify every item, the ItemUpdated event will obviously run for every single item.
In my ItemUpdated event, I want it to check if there is a Timer Job scheduled to run. If there is, then extend the SPOneTimeSchedule schedule of this job to delay it by 5 seconds. If there isn't, then create the Timer Job and schedule it for 5 seconds from now.
I've tried looking to see if the job definition exists in the handler and if it does exist, then extend the schedule by 5 seconds. If it doesn't exist, then create the job definition to run in a minutes time.
MyTimerJob rollupJob = null;
foreach (SPJobDefinition job in web.Site.WebApplication.JobDefinitions)
{
if (job.Name == Constants.JOB_ROLLUP_NAME)
{
rollupJob = (MyTimerJob)job;
}
}
if (rollupJob == null)
{
rollupJob = new MyTimerJob(Constants.JOB_ROLLUP_NAME, web.Site.WebApplication);
}
SPOneTimeSchedule schedule = new SPOneTimeSchedule(DateTime.Now.AddSeconds(5));
rollupJob.Schedule = schedule;
rollupJob.Update();
When I try this out on the server, I get a lot of errors
"An update conflict has occurred, and you must re-try this action. The object MyTimerJob Name=MyTimerJobName Parent=SPWebApplication Name=SharePoint -80 is being updated by NT AUTHORITY\NETWORK SERVICE in the w3wp process
I think the job is probably running for the first time and once running, the other ItemUpdated events are coming in and finding the existing Job definition. It then tries to Update this definition even though it is currently being used. Should I make a new Job Definition name so that it doesn't step on top of the first? Or raise the time to a minute?

I solved this myself by just setting the delay to a minutes time from now regardless of whether a definition is found. This way, while it is busy, it will keep pushing back the scheduling of the job until it is done processing

This is because the event is asynchronous. You'll need to rethink exactly what you're trying to solve with this code and potentially re-factor it.

Maybe you should try using "lock" on the timer job object?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Azure Scheduled WebJob Never Finished Status - azure

Related

How to re-queue message with updated information while working with Azure Service Bus Queue Function?

Twilio Taskrouter: How to prevent last worker in queue from being reassigned rejected task?

node-schedule undoing cancellation

QueueTrigger Attribute Visibility Timeout

Timer Job Scheduling on DataSheet Entry in SharePoint

Categories

Resources