I realized that the webjobs are killed and restarted during webapp deployment (zip deployment)
Is there some way to allow the currently running webjob to finish before it is replaced with the new code and restarted?
I want to avoid having jobs that are only partly finished and killed, because then I will not be able to just start them again since some actions (api requests, DB queries) were already performed and the initial state is not the same anymore)
Related
I have a short running container that does background processing (no ingress) that I have deployed to the Azure Container Apps service in Azure, my config is min replicas 0 (for when the container completes its work and exits) and max replicas 1 (i want only want one instance of my container running at any time).
I want to start my container once every hour, it generally runs for 3 mins and completes its task and closes.
Is there anyway with Azure Container Apps to schedule the start of my container? At the moment I have reverted to running my Azure DevOps pipeline on a schedule which calls the az containerapp update command, but it feels like the wrong way to go about this.
There's no scheduling concept in Container Apps. Here are some ideas:
1-Enable Ingress and create a Function or a Logic App that runs on a schedule and "ping" the Container App to start the process.
2-Create a Logic App that runs on a schedule and creates a Container Instance every hour, wait for it to complete, and delete it.
Our team recently had an incident due to our stateless services being restarted for azure runtime automatic updates. One of the services was in the middle of processing a task when it was forcefully shutdown. These tasks can take as long as 4 hours.
Either through code or configuration, is there a method for letting Azure know that our services are busy and can't be shutdown as this time?
In other words, how can we let Azure know when our services are ready for the service fabric runtime upgrade?
Well first of all, why don't you switch to manual upgrade mode?
Second, in the case of long running jobs you still have to take in account that nodes can fail, service instances can be moved or change role. All these kind of events will terminate your long running job if you don't handle shutdown notifications well.
The service is signaled that it will be shutdown etc. by Service Fabric by using the CancellationToken that is passed to RunAsync. The following is taken from the docs:
Service Fabric changes the Primary of a stateful service for a variety of reasons. The most common are cluster rebalancing and application upgrade. During these operations (as well as during normal service shutdown, like you'd see if the service was deleted), it is important that the service respect the CancellationToken.
Services that do not handle cancellation cleanly can experience several issues. These operations are slow because Service Fabric waits for the services to stop gracefully.
And this says the same but a bit shorter about the RunAsync method:
Make sure cancellationToken passed to RunAsync(CancellationToken) is honored and once it has been signaled, RunAsync(CancellationToken) exits gracefully as soon as possible.
In your case you should act on the CancellationToken being canceled. You should store the state of your current job somehow so you can resume it the next time RunAsync is called.
If it is really a long running job that cannot be interrupted and resumed by any means you should consider having this work done outside a Reliable Service, like a Web Job or something else. Or accept that some work might be lost.
In other words, you cannot tell Service Fabric to wait shutting down your service. It would mess up balancing and reliability of the cluster as well.
https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-capacity#the-durability-characteristics-of-the-cluster
Durability tier privilege allows Service Fabric to pause any VM level infrastructure request (such as a VM reboot, VM reimage, or VM migration)
Bronze - No privileges. This is the default.
Silver - The infrastructure jobs can be paused for a duration of 10 minutes per UD.
Gold - The infrastructure jobs can be paused for a duration of 2 hours per UD. Gold durability can be enabled only on full node VM skus like D15_V2, G5 etc.
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.management.servicefabric.models.nodetypedescription.durabilitylevel?view=azure-dotnet
I have a service that runs as an Azure Webjobs and scales out as needed, it's a long running process that can take a few hours for each message on the Queue. It works fine the only issue is that it relies on a third party for a rest endpoint that due to various issues can be unavailable.
My code catches this error and I need it to wait for 10-15seconds before it tries again, and I used
Thread.Sleep(10000);
This works locally but when in Azure as a webjob it seems to Pause all instances of the webjob not just the one that is needing to wait.
Any ideas as to why? each instance is on a difference thread I believe but I am relatively new to WebJobs so can't be 100% sure so some guidance on that would be good as well.
When you scale out the App Service Plan, All instances of the WebJob run independently. If they all sleep at the same time, it could be that the rest endpoint is generally unavailable during that time, leading all of them to sleep.
I'm working on the deployment processes for a web application which runs inside an Azure cloud service.
I deploy to the staging slot, once all the instances report a status of RoleReady I then do a VIP swap into the production slot. The aim is that I can deploy a new version and my users won't have to wait while the site warms up.
I have added a certain amount of warmup into the RoleEntryPoint.OnStart, essentially this hits a number of the application's endpoints to allow the caches to spin up and and view compilation to run. What I'm seeing is that the instances all report ready, before this process has completed.
How can I tell if my application has warmed up before I swap staging into production? The deploy script I'm using is a derivative of https://gist.github.com/chartek/5265057.
The role instance does not report Ready until the OnStart method finishes and the Run method begins. You can validate this by looking at the guest agent logs on the VM itself (see http://blogs.msdn.com/b/kwill/archive/2013/08/09/windows-azure-paas-compute-diagnostics-data.aspx for more info about those logs).
When you access the endpoints are you waiting for response or just sending a request? See Azure Autoscale Restarts Running Instances for code which hits the endpoints and waits in OnStart for responses before moving the instance to the Ready state.
I've got a continuously running WebJob on my auto-scale Azure website.
My WebJob is a simple console application with a while(true) loop that subscribes to certain messages on the Azure service bus and processes them. I don't want to process the same message twice, so when the web site is scaled an another WebJob is started I need for it to detect that another instance is already running and just sit there doing nothing until it's either killed (by scaling down again) or the other instance is killed. In the last scenario the second WebJob should detect that's the other is no longer take over.
Any takers?
You should create a queue (either using the Service Bus or storage queues) and pull the jobs off (create and manage a lease to the message) and process them from there. If that lease is managed properly, the job should only get processed once although you should make sure it's idempotent just in case as there are fringe cases where it will be processed more than once.