Azure Container Apps - Service stopping despite minimum replica being 1 - azure

I've got a .NET worker service based on a cron schedule running in a Docker container and pushed up to Azure Container Apps. The schedule is managed within the application itself.
The scaling is set to have a minumum of 1 replica running at all times.
However, we've found that for some reason the application starts up, idles waiting for the schedule trigger for ~20-30 seconds, stops for 2 seconds, starts and idles for ~20-30 seconds again and then doesn't run again for ~5-6 minutes. During the idling time, the job might start if the cron schedule lines up while the process is running.
Is there any way to diagnose why it might be auto-killing the application?
I can't seem to find any logs that show any fatal exceptions or along those lines, and running in other environments (locally, Azure Container Instance etc.) doesn't replicate the behavior. My suspicion is that it's the auto-scaling behavior: Azure is noticing that the process is idle for 20-30 seconds at a time and killing that replica, only for it to spin up again 5 mins later. However, I can't seem to find anything to prove that theory.
I'm aware that other resource types might be better suited (Container Instances, App Service, Functions) though for now I'm stuck with Container Apps.

Found the cause of the issue based on this SO question:
Azure Container Apps Restarts every 30 seconds
Turns out, Azure was trying to do health checks on it despite no HTTP ports being exposed. Azure thinking the container is unhealthy, kills and restarts it. Turning off HTTP ingress (and therefore the health checks) solved this issue.

Related

Understand why Azure App Service has delay to start processing requests (AppInsight Tracing)

We have a doubt about Azure because in some cases we have some dead times when we received requests in one of our AppServices or when a Service Bus triggers, for example, an Azure Functions.
If you see this image, you will see an example:
AppInsight Example Image
We execute a Request and at 5 seconds, but Azure delays more than 30 seconds to start the execution. We made a lot of optimizations in our apps, but we have no visibility about this delay.
Did someone face the same issue and found some solution? We believe it is a performance issue in the Workers, but, this happens also when the Workers are with a low load of memory and CPU. So we don't know how to scale horizontally automatically the resource if it is without load.
This happens also in our AZF, but we believe it's an issue between the Service Bus and the container of the AZF. In these cases we found the AZF has a higher consumption of CPU, but we don't why, because in the local environment we process a lot of messages with multithreading without any problem.

How cloud services are provisioned (and billed) once a new deployment is requested to Azure REST API?

I'm using Azure REST API to create, deploy and start a Cloud Service (classic) (cspkg hosted in Azure Storage) with hundreds of instances. I'm noticing that time Azure takes to provision and start the requested instances is really heterogeneous. First instances might start in 6-7 minutes but last ones might take up to 15-20 minutes, about 10 minutes longer than first ones. So my questions are:
Is this the expected behaviour? If so, what's the logic behind this? Could I do anything to speed things up?
How is Azure billing this? Is it counting the total count of instances since the very initial time when Cloud Service is deployed? or is it taking into account the specific timing on each individual instance?
UPDATE: I've been testing more scenarios and I've found a puzzling surprise. If I replace all the processes that my Cloud Service instances should run by a simple wait for some minutes (run .bat file with timeout command) then all the instances start almost at the same time (about 15 seconds between fastest and slowest instance). It was not just luck and random behaviour, I've proved that this behavior is repeatable and I can't even try to explain the root reason.
I also checked this a few weeks ago, and the startup time, depends on the size of the machine, if it is large it has more resources, so the boot time is faster, and also, if there is any error, exception on startup the VM will recycle till it can successfully start. I googled it, but did not find any solution to speed this up, so I don't think it is possible to do anything about the startup time. In the background every time when you deploy something, it will create a Windows Server, and boot it up and deploy your package on it and puts your web roles behind load balancer, this is why it takes so long, because a lot of things are happening.
The billing part is also not the best for the classic cloud services, you have to pay for it even during the startup and recycle, and even when it is turned off, so if you are done with your update, you should delete the VMs from your staging slot or scale it down, because you will pay for it even if it is turned off.

Stopping / Killing Azure Functions Running Instances on Consumption Plan

How do u kill azure function runnable instances (executions) on a Comsumption Plan (previously known as Dynamic Plan).
I am running the azure function on a runtime version of 1.0.
Few (some not shown in the log in the screenshot below) were running past the FIVE MINUTES functionTimeout threshold (check the one with DOTTED status).
There were however few instances that DID get killed AS expected when they reached the FIVE MINUTES THRESHOLD (check the one with CROSSED status)
What I tried:
As suggested in this SO question Stop/Kill a running Azure Function I restarted the website hosting the azure function
I even stopped / started the website just to be sure
I killed the processes from kudu interface but the logs still keep showing there was a rouge instance.
Process explorer showed 32 Threads but all of them were in WAITING status. Nothing was running from what I could observe.
Finally
I deleted the website and moved over a App Service Plan based function since that seems to be the only option azure functions which need flexible timeouts.
This is a monitoring bug, and although it looks confusing, would have no impact on the runtime behavior.
I have opened an issue to track this here and it will be updated as we make progress.
Thank you for your patience with this and for reporting the problem!

Azure Web Job - Scale out kills existing web job

I'm looking to use the API to change the number of web job instances I have running based on the size of a processing Q, I know I can setup rules in the portal but the minimum aggregation time is 60 minutes and I don't want to have the system waiting 60 minutes before scaling up if we suddenly get a burst of work.
The issue I have is that currently if I scale out in the portal manually from say 1 to 5 instances it kills the single running instance and then starts 5 new ones.
I assume if I did this through the API the same thing would happen, do you know if there is any way to avoid this?
Thanks
Si
UPDATE:
See below, I submitted 4 jobs and then as the first was processing I scaled out from 1 to 3 instances and this is what happened, the job that Never Finished then reran after the next 3 had finished as it's message would have popped back on the queue because it's processing failed initially.
if I scale out in the portal manually from say 1 to 5 instances it kills the single running instance and then starts 5 new ones.
As my test, if you scale your webjob, it will not kill the single running instance. I created a Webjob template and write a timer trigger in it.
Here is the time I scaled my web app:
Here is the trigger log in Azure storage ('azure-jobs-host-output'):
If you find your webjob is running in 'inactive instance' state in Azure webjob dashboard. Please do not worry about it. Your webjob is still in running. Please have a look at David's reply in this thread. Here is a snippet:
This is actually a bug in what the Portal displays. The Portal ends up asking an arbitrary instance about the WebJob status, and if it happens to hit any instance other than the one that's actually running it, it will be reported as inactive.

Does azure force kill processes by itself? My nodeJS/Java processes/Jmeter are force killed

I am using windows azure for a performance test in about 8 nodes, each running a different application. Since its a performance test, we do have quite a bit of traffic generated.
The test was running just fine for a few hours. Then suddenly we realise a few of the applications like nodeJS, JMeter and even Java processes have been force-killed. Each at a different time.
We find nothing in logs that indicate a out of memory or any other error or application issue. And this happens pretty often, once every few hours. For example we had seen this issue with jmeter shut down once every 3-4 hours and then once it had happened after 10 hours or continuous run.
So we suspect azure is using root permissions to force-kill the above processes.
Did any of you notice this with your applictaions on azure and do you know why?
Short answer, no, Azure does not kill your processes. There is no such thing as 'root permissions' to kill specific processes.
Are you running an IaaS VM or a PaaS Web/Worker Role? For PaaS, check out http://blogs.msdn.com/b/kwill/archive/2013/08/09/windows-azure-paas-compute-diagnostics-data.aspx for where to start getting diagnostic data. For IaaS, troubleshoot it like you would on-prem (DebugDiag, WinDBG, procmon, Application/System event logs, etc) since there is really nothing specific about Azure that would cause this behavior.

Resources