We are running scheduled databricks jobs on a daily basis in Azure databricks and it runs successfully on all days. But today (29th Sept 2020), the job is failing within few seconds with Internal Error. The error message is given below:
Error while fetching notebook snapshot: HTTP request failed with status: HTTP/1.1 403 Forbidden
Has anyone else faced this issue and knows how to solve this?
We were able to identify and fix the issue. The jobs were setup under a person's user id who left the organization last weekend. Since his id was not active, it didn't have access to run the job and it was failing. After changing the job owner to another user id, it ran fine
This is due to Service Disruption (Started: September 29, 2020 00:04 UTC and Resolved: September 29, 2020 04:56 UTC) from Azure Databricks.
Here are the details from Status Notification from Azure Databricks Status Page:
One of the affected Infrastructure Component: Authentication
We are investigating an issue affecting user login.
Users may observe intermittent or consistent log in failures.
Users may notice increased latency in jobs/notebooks.
The Azure Databricks Status Page provides an overview of all core Azure Databricks services. You can easily view the status of a specific service by viewing the status page. Optionally, you can also subscribe to status updates on individual service components, which sends an alert whenever the status you are subscribed to changes.
Reference: Azure Databricks Status page
Related
Over the weekend, our ADF solution cannot validate any more.
Error message at validation:
DF_Postcode Could not load resource 'DF_Postcode'. Please ensure no
mistakes in the JSON and that referenced resources exist. Status:
UnknownError, Possible reason: undefined
This includes triggers, pipelines, and dataflows.
We did not do any deployments between Friday and this morning. Any thoughts?
-- Update --
Possibly related, starting a data flow debug is not succesful.
-- Update 2 --
Multiple pop-ups appear when doing a shift+F5 refresh of the page. The error message itself is not very helpful.
It does appear there were few changes pushed to ADF over the weekend. However, as the error says, could you check the resources if they are intact or if any properties or values got reset! just in case to be sure to remove user configuration issue.
Check in the ADF studio for all the resources referenced in the error.
If you are using Powershell modules at any point in there, make sure you use the latest one.
Also, for quick check you can raise an issue here to get an official response.
Looking at the Service Blade in the Azure Portal I found an emerging issue listed.
Starting at 09:00 UTC, customers may experience errors using Azure
Data Factory in West Europe, using the Azure Portal UX. We are aware
of the issue and are investigating. Updates to follow in 60 minutes or
as events warrant. Workaround: Customers can manage Data Factory using
Azure Data Studio, Azure CLI or Powershell.
https://aka.ms/azuredatastudio
-- Update from Microsoft --
Summary of impact: Between approximately 06:30 UTC and 12:30 UTC on 13 Dec 2021, you were identified as a
customer using Data Factory V2 in West Europe who may have experienced intermittent errors when accessing
resources in this region.
Preliminary Root Cause: We determined a backend service, responsible for processing API requests became
unhealthy. This led to intermittent API failing calls for Azure Data Factory resources.
Mitigation: We restarted the backend service which mitigated the issue.
{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.","details":[{"code":"ArtifactVersionNotFound","message":"No version found in the artifact repository that satisfies the requested version '' for VM extension with publisher 'Microsoft.WindowsAzure.GuestAgent' and type 'CRPProd'."}]}
There are issues with VM's in Azure.
Virtual Machines - Investigating
Impact Statement: Starting at 07:00 UTC on 13 Oct 2021, a subset of customers using Windows Virtual Machines may experience failure notifications when performing service management operations - such as start, create, update, delete. Deployments of new VMS and any updates to extensions may fail. Non-Windows Virtual Machines, and existing running Windows Virtual Machines should not be impacted by this issue.
Current Status: We are aware of this issue and are actively investigating the issue. The next update will be provided within 60 minutes, or as events warrant.
This message was last updated at 08:52 UTC on 13 October 2021
More details: Azure status
User reported a failure of one of our Blazor Server apps an hour or so ago. When I investigated it seemed the Azure SignalR service was responding with "502 Bad Gateway" to the initial OPTIONS on the signalr hub negotiation (signalr is separate to the webapp that hosts the site)
In azure manpo, this shows for the SignalR service:
Restarting it does not succeed. Clicking "view activity logs" in the "the resource is ina failed state" banner simply brings a "Code: 'invalidRG'" message
The only significant event recently on this subscription was that it converted from a Free-Trial to Pay-as-you-go and there were some issues transitioning (upgrade done post subscription disable for lack of payment method, took some time to get it reactivated), but then everything seemed to work well for a day
There are many other services in the same resource group, apparently working fine - it's just SignalR. The "Azure status" page shows that all SignalR services are in "Good" condition.
Where does one go from here to diagnose and fix this? Is it a "pay for support from MS and ask them"?
Even though it wasn't a billing issue I wrote on the end of my billing support ticket that I'd raised to get a payment method problem sorted out during subscription upgrade. Support wrote back acknowledging a problem with the Azure SignalR service that was actively being worked on. They claimed that it was already resolved by the time they read my ticket update..
..I don't believe the staus dashboard ever showed AzSignalR as anything other than healthy so it might be that it makes sense to sign up for at least developer support level so there is a route for reporting these things. Either that or (depending on one's moral compass) raise them as billing requests (which are free) if one feels that service availability is a billing related thing (and I suppose it should be; they can't reasonably charge you for services they aren't providing, even if it is only a few cents)
rca in progress
Azure Signal R - Service availability/management operation failures - Mitigated
Resolved: An Azure service issue (Tracking ID 1L_L-NZG) impacted resources in your subscription.
Summary of impact: Between 06:00 and 14:00 UTC on 21 Jul 2021, you were identified as a customer using Azure SignalR Service who may have received failure notifications when attempting to connect or access resources. Additionally, failures may have been seen when attempting to perform service management operations - such as create, update, delete.
I have had an Azure SQL DB point in time restore running for two days. I want to cancel it as I think there is an issue. I can see the DB restoring in SSMS but can't find the deployment in my Azure Portal. Does anyone know how to cancel it? I have tried using Azure CLI but I can't see the resource.
It's called Azure Hiccups, it happened to me yesterday on Switzerland West region between 10:20 and 10:40.
I re-run it and everything was fixed.
If I check the Activity Log I can see the error:
But if I browse in the Service Health it says everything was good:
What to do in case of Azure Hiccups:
FIX: Re-run the task, hopefully it will fix the issue, like when you hit an old TV with your fist.
PREVENT: You can try to create an Activity Log alert but once again it will be based on Service Health (which says that everything is good) and not on the actual Activity Log. So you will probably miss issues like this and will discover the problem 24h later.
POST-MORTEM: You can take a screenshot of the failed task/service in the Activity Log, show it to Microsoft and ask for a refund if possible. For the future you can check the current status of Azure in the official Status page and subscribe to the RSS feed. You can browse the Azure Status History. But as I said none of the last two reports the Azure Hiccups so the screenshot of the Activity Log is still the only proof that a tree yesterday has fallen in the forest.
As Microsoft SLA says that the High availability for Azure SQL Database and SQL Managed Instance is 99.99% of the year you can start collecting those screenshot and open tickets with their support.
After dropping the Database this morning, the operation status of which was unsuccessful. The Restore has finally been canceled 8 hrs after attempting to drop the database.
Found a solution, just create a new database of the same name. And the restoring one will be replaced with the one created, then you can delete it.
About 10 days ago I created my first Azure Sql Database. I choose the Basic Plan (4.21 €/month). This database is used only for testing purpose. Today I received an email from Microsoft Azure.
Subject of the mail : Your services were disabled because you reached your spending limit
Body of the mail : Keep building in Azure by adjusting your spending limit. Your services were disabled on May 7, 2020 because you’ve reached the monthly Azure spending limit provided by your Visual Studio subscription benefit. To keep using Azure, either:
1. Wait for your monthly spending limit to reset at the start of next month, or
2. Adjust your monthly limit for a specific month or for the life of your subscription—you only pay for the extra amount you use each month.
Why did Azure changed the Pricing Plan of my database without notifying me ? Can some actions cause this ?
I know that I did an Export Data-tier Application from Microsoft SQL Server Management Studio from which I was connected to my Azure Database (I made a backup from there). I doubt this explains that.
UPDATE
As suggested by NillsF i checked the deployment history and I can confirm I choose the Basic Plan when I created the database (see below). So I still have no clue what's happening to my database.
You can check the activity log on your subscription to see who initiated the switch from Basic to Vcore. It seems strange that MSFT would have done this on your behalf.
You can also check the deployment history on your resource group to verify the tier you picked when you created the resource itself: