We are using Azure Service Bus in the West Europe region.
When I looked in our metrics to investigate a duplicating issue, I noticed that we have a high amount of "server errors". It's nearly 50% of the requests it seems.
Now there is not a lot of information regarding the meaning of server errors by Microsoft, other than saying it's "internal service bus errors" (see https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-metrics-azure-monitor)
My Question:
Do I have to be concerned? Can I still rely on Service Bus in West Europe currently (there were some issues the last month in this region with Service Bus).
What does it mean and how can I solve it?
Thx in advance
Related
User reported a failure of one of our Blazor Server apps an hour or so ago. When I investigated it seemed the Azure SignalR service was responding with "502 Bad Gateway" to the initial OPTIONS on the signalr hub negotiation (signalr is separate to the webapp that hosts the site)
In azure manpo, this shows for the SignalR service:
Restarting it does not succeed. Clicking "view activity logs" in the "the resource is ina failed state" banner simply brings a "Code: 'invalidRG'" message
The only significant event recently on this subscription was that it converted from a Free-Trial to Pay-as-you-go and there were some issues transitioning (upgrade done post subscription disable for lack of payment method, took some time to get it reactivated), but then everything seemed to work well for a day
There are many other services in the same resource group, apparently working fine - it's just SignalR. The "Azure status" page shows that all SignalR services are in "Good" condition.
Where does one go from here to diagnose and fix this? Is it a "pay for support from MS and ask them"?
Even though it wasn't a billing issue I wrote on the end of my billing support ticket that I'd raised to get a payment method problem sorted out during subscription upgrade. Support wrote back acknowledging a problem with the Azure SignalR service that was actively being worked on. They claimed that it was already resolved by the time they read my ticket update..
..I don't believe the staus dashboard ever showed AzSignalR as anything other than healthy so it might be that it makes sense to sign up for at least developer support level so there is a route for reporting these things. Either that or (depending on one's moral compass) raise them as billing requests (which are free) if one feels that service availability is a billing related thing (and I suppose it should be; they can't reasonably charge you for services they aren't providing, even if it is only a few cents)
rca in progress
Azure Signal R - Service availability/management operation failures - Mitigated
Resolved: An Azure service issue (Tracking ID 1L_L-NZG) impacted resources in your subscription.
Summary of impact: Between 06:00 and 14:00 UTC on 21 Jul 2021, you were identified as a customer using Azure SignalR Service who may have received failure notifications when attempting to connect or access resources. Additionally, failures may have been seen when attempting to perform service management operations - such as create, update, delete.
I have an MVC based web app running on Azure. The CPU performance of it has been very predictable over the past five months. However, over the past 24 hours, and most recently, from 1:00 pm to 1:30 pm Eastern time, today, in the USA, I have had CPU spikes nearing 100%. The image below, which is for the past 7 days shows this.
This CPU spike is not coming from my app or my users. There has not been an abnormal increase in users, user activity or queries. I also checked Google Analytics to see if perhaps my site was getting hammered by random users etc. It showed nothing out of the ordinary.
There also was a corresponding huge jump in data going out of my site, which is highly unusual. The second image shows data egress for the past week. However, as I said, I checked my Azure SQL Database Query Store and it shows absolutely nothing out of the ordinary. Furthermore, my DTU percentage never even neared 100% during this time, which it certainly would have if this much data was pulled from the database.
I have basically ruled out anything amiss on my end. Is there some way I can check to see if there were issues with Azure causing this?
If you are suspecting an underlying Azure platform issue, both Azure Service Health and Azure Resource Health are useful resources to determine if you are being impacted by platform issue.
Azure Service Health provides personalized service health information when Azure platform issues impact your resources.
https://learn.microsoft.com/en-us/azure/service-health/service-health-overview
Azure Resource Health provides visibility into whether your Azure resources are healthy or unhealthy.
https://learn.microsoft.com/en-us/azure/service-health/resource-health-overview
For a list of supported Azure resources, you can refer to this article which also describes the set of health checks being performed.
https://learn.microsoft.com/en-us/azure/service-health/resource-health-checks-resource-types
Is there any way to get Azure status update only for some services and regions I am using? For example, I am using Cloud Services in West US. When this service in West US is down, I want to get an alert for it. I don't care about other services and other regions.
If you set up alert notifications for your application, you'll get notified when any of the underlying services you're using are not functioning properly. An alert will ensure that your service is available and working.
https://azure.microsoft.com/en-us/documentation/articles/insights-receive-alert-notifications/
If you get an alert about a service issue, that's when I would first take a look at the Azure status dashboard, and then take a look at your application logs to troubleshoot.
Another trick is to create simple URL's in your application that do a quick service test. For example, let's say you're using blob storage in the west datacenter. You could set up a page that does a test write/read to ensure that service is working. This will give you a 100% accurate indication if there is a problem. Since the cloud is highly distributed, and services statuses don't update immediately, I find this method highly preferable.
You would then point your alert monitoring at URL's like this:
http://yourapp.com/
http://yourapp.com/blobtest
http://yourapp.com/redistest
The Azure Status website has the information your need for all Azure regions.
https://azure.microsoft.com/en-us/status/
I keep getting this 503 error on all my Standard Azure Websites on West Europe.
I've tried restarting the App pool, and also tried scaling down and then up again.
Am I missing something or could it be an error at Microsoft?
We are experiencing the same problem here. I already opened a support ticket and received the following answer:
Thank you for contacting Microsoft Support. This case is related to an
incident currently occurring. We have a team that handles all cases
related to incidents such as this. We will transfer your case to that
team. Once that transfer takes place, here is what you can expect to
happen.
We will send you regular updates on the status of the incident. This information is the same information available from the Azure
Status Page located at
http://azure.microsoft.com/en-us/status/#current. However, instead of
requiring that you check the status page, we will push the information
to you in email.
Once the incident has been mitigated, we will inform you with another notification, after which point your case will be closed.
Now Microsoft is showing the error on its Azure Status website.
Websites in North Europe are working. So we will setup a Traffic Manager on Azure with method "Failover". In case of problems in West Europe the Traffic Manager will automatically route all connections to the mirrored website in North Europe.
See https://azure.microsoft.com/en-us/documentation/articles/traffic-manager-overview/
I have a few sites that are running in the Azure Web Sites (North Europe region) at the moment, over the last 12 or so hours I have noticed some really poor performance, anyone seeing the same thing at the moment?
There is nothing logged in the Azure Service Dashboard for service degradation or outage. Perhaps you may raise a support ticket with Azure here to see whats wrong...