Ok Azure Experts,
I have a task that only needs to run once every week - this is a long
running task that can take 2-3 days to run.
I have set up a worker role to scale based on a queue. On the day
that we want the task to start - we populate the queue (using a Web
Job).
During the rest of the time, when the queue is empty, I want the
worker roles to shut down - but I cannot scale down to 0 instances.
Originally, we wanted to do this with a Web Job, but the website shuts down from time to time - abruptly turning off my webjob - is this supposed to happen? Even with Keep-Alive turned on? Also, you cannot stop a triggered Web Job from running - so if we want the process to stop - we need to turn off the Web site - not ideal.
How do I scale my instances down to zero?
* Alternatives solutions are also welcome.
Trying to minimize cost here - why pay for a worker role that isn't doing anything?
It is not possible to scale-down a Worker Role to 0 instances at this time. Even if you STOP the worker role, you're still incurring charges for STOPPED instances.
However, behavior that you're looking for, is possible with Virtual Machines. If you shutdown (STOP & DEALLOCATE) a virtual machine, you're not paying fees for that machine.
Now, the only challenge is to stop/start the VM based on a queue count. I don't recall if Azure portal's native scaling supports scaling down to 0 instances for VMs. However, if you use AzureWatch, you should be able to get this done without any issues. Disclaimer: I am affiliated with AzureWatch.
HTH
Related
I have a .NET Core application running on Azure AppService.
I have also a hosted service interacting with a table and updating it according to a given condition.
I am experiencing an increase of CPU usage of just one of the instances when my autoscaling rules kick in and start scaling out the AppService.
My question is if HostedService is automatically scaled-out too when new instances are warmed up by autoscaling.
Yes they are. When you scale out, you have multiple instances of the same application running.
I figured this out the hard way. I was running a hosted service that sent out reports through emails once a day. After I scaled out to 2 instances, clients started receiving the same email twice a day.
Our team recently had an incident due to our stateless services being restarted for azure runtime automatic updates. One of the services was in the middle of processing a task when it was forcefully shutdown. These tasks can take as long as 4 hours.
Either through code or configuration, is there a method for letting Azure know that our services are busy and can't be shutdown as this time?
In other words, how can we let Azure know when our services are ready for the service fabric runtime upgrade?
Well first of all, why don't you switch to manual upgrade mode?
Second, in the case of long running jobs you still have to take in account that nodes can fail, service instances can be moved or change role. All these kind of events will terminate your long running job if you don't handle shutdown notifications well.
The service is signaled that it will be shutdown etc. by Service Fabric by using the CancellationToken that is passed to RunAsync. The following is taken from the docs:
Service Fabric changes the Primary of a stateful service for a variety of reasons. The most common are cluster rebalancing and application upgrade. During these operations (as well as during normal service shutdown, like you'd see if the service was deleted), it is important that the service respect the CancellationToken.
Services that do not handle cancellation cleanly can experience several issues. These operations are slow because Service Fabric waits for the services to stop gracefully.
And this says the same but a bit shorter about the RunAsync method:
Make sure cancellationToken passed to RunAsync(CancellationToken) is honored and once it has been signaled, RunAsync(CancellationToken) exits gracefully as soon as possible.
In your case you should act on the CancellationToken being canceled. You should store the state of your current job somehow so you can resume it the next time RunAsync is called.
If it is really a long running job that cannot be interrupted and resumed by any means you should consider having this work done outside a Reliable Service, like a Web Job or something else. Or accept that some work might be lost.
In other words, you cannot tell Service Fabric to wait shutting down your service. It would mess up balancing and reliability of the cluster as well.
https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-capacity#the-durability-characteristics-of-the-cluster
Durability tier privilege allows Service Fabric to pause any VM level infrastructure request (such as a VM reboot, VM reimage, or VM migration)
Bronze - No privileges. This is the default.
Silver - The infrastructure jobs can be paused for a duration of 10 minutes per UD.
Gold - The infrastructure jobs can be paused for a duration of 2 hours per UD. Gold durability can be enabled only on full node VM skus like D15_V2, G5 etc.
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.management.servicefabric.models.nodetypedescription.durabilitylevel?view=azure-dotnet
We are using Azure Service Fabric (Stateless Service) which gets messages from the Azure Service Bus Message Queue and processes them. The tasks generally take between 5 mins and 5 hours.
When its busy we want to scale out servers, and when it gets quiet we want to scale back in again.
How do we scale in without interrupting long running tasks? Is there a way we can tell Service Fabric which server is free to scale in?
Azure Monitor Custom Metric
Integrate your SF service with
EventFlow. For instance, make it sending logs into Application Insights
While your task is being processed, send some logs in that will indicate that
it's in progress
Configure custom metric in Azure Monitor to scale in only in case on absence of the logs indicating that machine
has in-progress tasks
The trade-off here is to wait for all the events finished until the scale-in could happen.
There is a good article that explains how to Scale a Service Fabric cluster programmatically
Here is another approach which requires a bit of coding - Automate manual scaling
Develop another service either as part of SF application or as VM extension. The point here is to make the service running on all the nodes in a cluster and track the status of tasks execution.
There are well-defined steps how one could manually exclude SF node from the cluster -
Run Disable-ServiceFabricNode with intent ‘RemoveNode’ to disable the node you’re going to remove (the highest instance in that node type).
Run Get-ServiceFabricNode to make sure that the node has indeed transitioned to disabled. If not, wait until the node is disabled. You cannot hurry this step.
Follow the sample/instructions in the quick start template gallery to change the number of VMs by one in that Nodetype. The instance removed is the highest VM instance.
And so forth... Find more info here Scale a Service Fabric cluster in or out using auto-scale rules. The takeaway here is that these steps could be automated.
Implement scaling logic in a new service to monitor which nodes are finished with their tasks and stay idle to scale them in using instructions described in previous steps.
Hopefully it makes sense.
Thanks a lot to #tank104 for the help on elaborating my answer!
I have one big task to do every day, with no need to scale, that takes about 30 minutes and is DB, processor and memory intensive.
This means actual 16h/month of computation time.
WebJobs require constantly running WebSite 744h/month
WebRole is also constantly running 744h/month
Azure Batch - suited for scaled storage input - storage output
processing (or that is how I understand it)
Stopped cloud service still cost you. Setting instance count to 0 is not available. And paying for 728h/month unused computation time looks like madness. Only thing I can imagine is automatic deployment of cloud service every day and automatic deletion of deployment once task is finished, but this also looks like madness.
Are there any options for this scenario in Azure?
Cloud service will be charged continuously until the deployment is deleted. Yes you can delete it every day and redeploy...
Azure VMs in Stopped (Deallocated) status, does not incur any charge. You can shut them down in portal or by script when you don't need them.
I think there is a large difference in billing if you only use it 62h/month. Would you consider switch this deployment to VM? WorkerRole and VMs can be placed on the same subnet, they can still connect to each other.
Can anybody explain the difference between Azure Web Jobs and Azure Scheduler
Azure Web Jobs
Only available on Azure Websites
It is used to run code at particular intervals. E.g. a console application every day
Used to trigger and run workloads.
Mainly recommended for workloads that either scale with the website or are relatively small.
Can be persistently running if "Always On" selected, otherwise you will get the 20 min timeout.
The code that needs to be run and schedule are defined together.
Azure Scheduler
Is not tied to Websites or Cloud Services
It allows you to call a website or add a message to a storage queue
Used for triggering events or triggering small workloads (e.g. add to queue), usually to trigger larger workloads
Mainly recommended for triggering more complex workloads.
This is only a trigger, and a separate function listening to trigger events (e.g. queue's) needs to be coded separately.
For many instances I prefer to use the scheduler to push to a storage queue and a worker role on each instance takes off the queue. This keeps tasks controlled granularly and can also move up or down in scale outside of your website.
With WebJobs they scale up and down with your site and hence your background tasks can become over taxed if your website is experiencing low traffic and scaled down.
Azure Scheduler - Provides a way to easily schedule http calls in a well-defined schedule, like every hour, every Friday at 9:00 am, Once a day, ...
Azure WebJobs - Provides a way to run small to medium work load (in the form of a script: .exe, .cmd, .sh, .js, ...) at the same context of an Azure Website (but can be hosted even with an empty website).
While a WebJob can run continuously (with a process that has a while loop) and Azure will make sure this WebJob is always running (with "Always On" set).
There is also an integration between Azure scheduler and Azure WebJobs where you have a WebJob that is running some finite work and the schduler is responsible for scheduling this work (invoking the WebJob).
So in summary, the scheduler is about scheduling work and WebJobs is about running work load.