Getting error "Unable to create recipe: Service invocation failed!Request ... " when training any of my models - azure-machine-learning-service

I'm getting this error message when training any of my ML models in AML
(run = exp.submit(src))
Unable to create recipe: Service invocation failed! Request: POST
https://cert-westus2.experiments.azureml.net/rp/workspaces/s
Training for these models was working fine last week.
Thanks for your help!

transient issues do happen from time to time. one suggestion -- try deleting and recreating the compute target.

Can you confirm that the acr associated with your workspace has admin account enabled? Our logs indicate that admin account might be disabled on your workspace ACR.
https://learn.microsoft.com/en-us/azure/container-registry/container-registry-authentication#admin-account

Related

GKE: Impossible to delete a cluster

I have a weird issue with GKE, the cluster has been created by Terraform, and I tried to make a change requiring a deletion and re-creation.
It failed at the re-creation because I was missing an API, so I added it and retry.
Thing is that I have a cluster that exists, empty but with failed to delete cluster message on it.
I never had this issue and I already destroyed and re-created this very resource. I tried to destroy all the resources created by terraform on this project but I still get an error "failed to delete cluster".
Also I tried to do it by hand on the UI but still get the same error.
I tried to do it using
gcloud container clusters delete <cluster_name> and got
"Failed to delete cluster, name: operation-xxx-xxx..." and got a link to the operation failed.
It's a JSON with a 401 code, with the following message:
Request is missing required authentication credential. Expected OAuth
2 access token, login cookie or other valid authentication credential.
See
https://developers.google.com/identity/sign-in/web/devconsole-project.
I tried to re-auth but it doesn't help I get the same error.
I'm running out of idea, can you help me here?
A 401 (unauthorized) suggests that you've insufficient permissions to delete the cluster.
Either get a role that permits your user account to delete clusters.
Or ask someone who has an account that has sufficient powers to delete it for you.
Or authenticate gcloud (gcloud activate-service-account) with the Service Account that you used to create the cluster (assuming it can delete clusters too) and then use gcloud container clusters delete ... optionally include --account=${SERVICE_ACCOUNT_EMAIL} or just ensure the Service Account is ACTIVE with gcloud auth list.
I did not found a proper solution, but what did work was to delete the whole project and start over.
Luckily for me it was a lab, not a production project...

Azure Data Explorer error when creating cluster: subscription '' is not registered

While working on this official tutorial Create an Azure Data Explorer cluster and database, I am getting the following error when creating a Cluster. Question: What I may be missing and how the issue can be resolved?
Remarks:
I'm using Visual Studio Enterprise Subscription - MPN
My online search shows similar error here but the context seems different since those error messages are related to The subscription not registered to use namespace. Not sure if there is a relevance to my error.
{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.","details":[{"code":"SubscriptionNotRegistered","message":"The subscription 'a86d7e9f-210d-48e8-8f5e-528015d1c998' is not registered."}]}
Using the link provided in the error, I got the following:
When I click on the 'write cluster resource' link from the above screen:
The error is because you did not register the Kusto resource provider as described here
However, once you create a new cluster for the first time on a given subscription and it fails because the provider is not registered, Kusto tries to register it for you. So if you try again it should just work, if not please follow the process in the link.

How can I get information on my template failing to start?

I'm using Azure Labs Services (for classrooms), and I can't start my Template VM. The "start VM" trigger will work, but the VM will fail to start and return to a "stopped" state without any error message in the Labs environment or the Azure Portal. Is there a way I can pull more debugging information as to why my Template didn't start, or a possible troubleshooting option from someone who's experienced this problem before?
Yes of course, you can troubleshoot it further by checking the Activity logs of your Lab account from within the Azure portal as follows:
Expanding the failed event further, you should be able to see the Error code and the Message. Switching to the JSON representation, look for the statusMessage key within properties that has more details.
For example:
..
"properties": {
"statusMessage": "{\"status\":\"Failed\",\"error\":{\"code\":\"ResourceOperationFailure\",\"message\":\"The resource operation completed with terminal provisioning state 'Failed'.\",\"details\":[{\"code\":\"ResourceGroupNotFound\",\"message\":\"Resource group 'MX-RG-xxxxx' could not be found.\"}]}}"
},
..
This should hopefully give you enough information to take the next steps.
There's an ongoing outage for Azure Lab Services. Please follow updates here.

The template deployment 'Microsoft.Web-WebApp-Portal-3994ede8-a307' is not valid according to the validation procedure

This is part of an exercise of MS Learning, https://learn.microsoft.com/en-us/learn/modules/create-release-pipeline/5-deploy-to-appservice. On manual creation of Azure App Service, I am getting the error.
Please help with the Cause and Resolution
Steps to recreate:
azure.com >> Azure App Service >> +Add > add the project details like subscription, resource group etc... >> Review+Create, Below error:
The template deployment 'Microsoft.Web-WebApp-Portal-3994ede8-a307' is not valid according to the validation procedure. The tracking id is 'ca7e085f-a756-4344-bfe1-07444ff0fe0e'. See inner errors for details.
I'd like to know what is causing this error - and how I can avoid it?
So looking at the output in the network tab of the browser i saw this:
The requested app service plan cannot be created in the current resource group because it is hosting Linux apps. Please choose a different resource group or create a new one.
So deleted the app service plan/web app and now it works.
MS, please show us the error ...
I was able to fix this error by simply recreating webapp, and selecting another location. From Central Usa to South Central USA. It worked for me
Changing the location, helped me in fixing this error.
I was facing the same issue so I changed the location from "East US" to "East US 2" and it also worked for me.

Where to find logs for databricks workspace?

I created a databricks component with an vnet based on this template and documentation. The problem is that we receive an error when trying to launch a workspace.
"We've encountered an error creating your workspace. Please wait a few minutes and try again."
In the documentation, there is a similar error in troubleshooting section but it's not the same.
The problem could be a network problem as the documentation suggests, but the ARM has been probed in other azure environments and it works properly.
The problem is creating a workspace but we don't know why.
Does anyone know where to find any kind of logs about workspace creation or know anything about this error message?
Thanks.
This error means that your workspace failed to be provisioned. We had this when a Policy on our subscription blocked the resource from being created. The policy was to ensure that Tags were set. Check to see if you have any Policies enabled.
Any logs you can see will be in the resource group under the deployments blade. But it probably won't show anything useful. You should raise a support ticket if you cannot track the problem yourself.

Resources