Azure storage queue polling stops after connectivity issues - azure

I'm experiencing intermittent 503 Service Unavailable from azure storage.
Webjobs runner hosted as a top shelf service. Due to the fact that I used JobHost.Start() instead of JobHost.RunAndBlock() every time I get 503 from azure storage, service ends up in a corrupted stopping state.
When I switched to JobHost.RunAndBlock() using instead service is running continuously now but after 503 exception queue trigger stops polling queues.
I use standard azure queue trigger bindings. No manual setup.
Anyone experienced similar behaviour? How to recover from such connectivity errors?

Assuming you are using C# for your web job, I think using something like Polly or Enterprise Library's Transient Fault Handling Application Block to implement retry logic for when an occasional error occurs while using an Azure service, as you might be hitting a throttling thresholds (resource limit per your selected service tier).
Hope it helps!

Related

Azure functions - Functions host is not running

I keep getting an error 503 on my health checks for my azure functions, it says error 503. Functions host is not running. It's very inconsistent and only happens once every few days, I'm on the consumption plan but looking at whether a premium plan would fix the issue.
In Azure Functions, 503 service unavailable causes for the reasons like:
Function host is down/restarting
Platform issue due to the backend server not running/ allocated
Memory leak/issue from the code causing the backend server to return 503
To get some insights about the function host related issues, take a look into the "Diagnose and solve problems" blade in the Function app and select the "Function app down or reporting" detector. This detector will show all the diagnostic information about the function app and its infrastructure.
503 service unavailable comes sometimes when the function takes more than 5 minutes to return an HTTP response in consumption plan. Regardless of the function app timeout setting, 230 seconds is the maximum amount of time that an HTTP triggered function can take to respond to a request.
For longer processing times, use Azure Durable Functions async pattern. Refer to this link.
We should not change the app settings frequently in the production environment. If you update the app settings, the app will be restarted. In this cases, you will get 503 error. In order to avoid this, you can use the slot feature
Function host is not running
This issue happens due to invalid host.json. To diagnose, it's best to look at the function host logs from the log stream in Azure Portal.
Few errors and resolutions of this kind of error are:
If you have any startup.cs class, check if any error available where the errors were logged in Application Insights.
One of the reason is a missing app setting. Ensure you publish local settings as well.
If it didn't help then one of the reason could be platform issue and to confirm this we need to look into the backend logs on what was happing during that time resulting in 503 errors.
You can create the support ticket with Microsoft to assist you further.
According to this thread, one possible cause of 503 service-unavailable responses is when the service consumes more memory than what is available under the consumption (serverless) plan, causing the service to be evicted. Switching to a dedicated hosting plan can fix this issue. According to Microsoft's documentation, it appears that the function is allowed a maximum of 1,536MB of memory at one time. Of course, it could also be the case that your function is exceeding any of the other service limits associated with that plan, so my advice would be to add instrumentation and code defensively.
Got 503 after redeploying an Azure function.
Turned out the Python version had defaulted to 3.6, updated to 3.9 and started working.
I was using terraform and github actions

Azure function deploy error : 503 Temporarily Unavailable

I have a python function app created using ARM template. When i tried to deploy a function using the azure devops pipeline, I am facing the below error,
"Failed to deploy web package to App service. Service Temporary unavailable Code 503”
Usually the 503 error may be due to the server being overloaded or down for maintenance.
As Shariful said, the first thing you should check is that if your Function started or not.
If your Function prepared already and you didn't try too many times for deployment (if you do, try deploy later), you could consider another case that your server protection policy is improper.
For example, if your server's access policy was tweaked to a single
IP, limited to 10 requests per minute, and you usually click more than
100 connections per minute, the 503 Service Temporary unavailable
came up.
Then you should changed the single IP access limit from 10 to more than 100, the 503 error would disappear.
Here is an article about finding the reasons of 503 Service Temporarily Unavailable, you can have a look.
Solved by deleting and recreating the Azure Function. Nothing else seemed to work.

I am seeing 502 errors reported in Diagnose and Solve for my Azure App Service

Within the Web App Down page in Diagnose and Solve for my Azure App Service I am seeing a series of 502 errors that have been occurring consistently for the past few hours. My site is unreachable upon browsing. I have tried restarting the app, and this has not helped. There have been no recent code deployments or configuration changes that led to this error.
Looking at the Microsoft Documentation I see:
https://learn.microsoft.com/en-us/azure/application-gateway/application-gateway-troubleshooting-502#cause-1
This seems to be an issue with the connectivity to the back end address pool that is behind a gateway which should be managed by Azure.
As you said, 502s generally indicate being unable to connect to back-end instances.
A solution to this can be to scale up or scale down your app service plan ensuring that you remain within the same tier (i.e. standard vs premium), so as to not change your inbound virtual IP, wait ~5 minutes, and then scale back.
Examples: S1 -> S2 or P2v2 -> P1v2
This operation, also referred to as the "scaling trick", allocates both new instances to the app service plan hosting your web apps, as well as a new internal load balancer.
In the event that there is a process hang-up caused by another resource running on the same hardware hosting your instance(s) and your site, this is the most efficient way to move your site to a new instance. Essentially, this functions as a hard reset beyond the capabilities of the traditional restart.
Lastly, because Azure bills by the hour and this temporary scale is for only 5 minutes, in the event that you need to scale up to remain in the same app service plan tier
(i.e. standard vs premium), you will face either negligible cost or no cost at all.
For future reference, in order to prevent this issue from re-occurring, if you have multiple instances running for your app then please consider enabling health check feature: https://learn.microsoft.com/en-us/azure/azure-monitor/platform/autoscale-get-started#route-traffic-to-healthy-instances-app-service
You can find other best practices here: https://azure.github.io/AppService/2020/05/15/Robust-Apps-for-the-cloud.html

Why does Azure give me an intermittent Error 503. The service is unavailable?

I have an Azure service that has been running for a long period of time. It builds a word or powerpoint document based on arguments in the request and returns a uri to the build document. This is access via a visualforce page, when you click a button, it calls the service and displays a link to the document that has just been built. Simple.
All of a sudden, I get an apparently random 503 Service Unavailable error. Sometimes I click the button, no problem. Other times a 503 error. Each time the button triggers exactly the same request. Does anyone know why this might be happening?
Apparently doing the same thing over and over again and expecting a different result, is not insanity!
Thanks for taking the time to read this.
Looking at the monitoring on my service told me the processor was never exceeding 6% of usage, so it couldn't be a lack of resource causing these intermittent 503 errors. It's bizarre and I'm afraid I have no explanation for it, but simply redeploying the cloud service to Azure appears to have done the trick. It now works perfectly. The solution has not changed, so I can only imagine that whatever 'reboot' is necessary after deployment, has rectified whatever the problem was. All I can suggest is that you try the same thing if you are getting intermittent 503 errors.
For me the error went away when I set up auto-scaling. I think failover requests were getting routed to my second VM, and the second VM took some time to spin up because it wasn't ready for the activity. Auto-scaling shut down my second VM and the error no longer appears (I'm assuming it will spin up if/when I get enough traffic to use it).
Hope this also helps someone.
I get this error whenever I create an Azure Function with a storage account in the South Central US. If I use a storage account in a different region the function works.
Try a storage account in a different region than the one you are currently using to see if it resolves your issue.
503 error is simply shows that your application pool was inaccessible, it was intermittent because your application pool is restarting because the lack of resource (processor, memory, etc).
Scale up your instance (Cloud Services or VM) to get more resource for the application pool.

What could be causing my WebRole to never start?

I have a web service hosted on azure as a web role.
In this web role I override the Run() method and perform some db operations in the following order to basically act as a worker role.
go to blob storage to pull a small set of data.
Cache this data for future use.
Kill some blobs that are too old.
Thread.Sleep(900000);
Basically the operation repeats every 15 minutes in the background.
It runs fine when I run on DevFabric but when deployed the azure portal get stuck in a loop of stabilizing role and then preparing node.
Never actually starting up either instance.
I have enabled diagnostics and it isn't showing me anything to suggest there is a problem.
I'm at a loss for why this could be happening.
Sounds like an error is being thrown in the OnStart. Do you have any way of doing try/catch around the whole function and dumping the error into an EventViewer? From there you would be able to remote into the instance and investigate the error
Most likely your configuration deployed to cloud is different from the one running in an emulator (SQL Azure firewall permissions, pointers to Local Dev storage instead of ATS, etc). Also, make sure that your diagnostics is pointing to a real Azure account instead of local Dev storage.
I would suggest moving this code to Run(). In OnStart(), your role instance isn't yet visible to the load balancer, and since you're introducing a very long (ok, infinite delay) into OnStart(), this is likely why you're seeing the messages about the role trying to stabilize (more info on these messages are here.)
I don't typically like to answer my own question when someone else has made an effort to help me however I feel that the approach I used to solve this should be documented.
I enabled intellitrace when deploying to Azure and I was able to see all the exceptions being thrown and investigate the cause of the exceptions.
Intellisense was critical in solving my deployment issues. I would recommend it to anyone seeing an inconsistency between deploying devfabric and deploying to azure.

Resources