Azure machine learning webservice deployment failed - azure

Deployment error to ACS - BadRequestFormat. How do I get past this? This is my nth attempt to make the tutorial work end to end https://learn.microsoft.com/en-us/azure/machine-learning/preview/tutorial-classifying-iris-part-1.
az ml env setup -n gopenv --location westcentralus -c
Subscription set to Visual Studio Premium with MSDN
Continue with this subscription (Y/n)? y
Resource group gopenvrg already exists, skipping creation.
creating service principal.........done
Created a service principal: %s 96f6dd9e-c9d6-4856-9f8b-5426c7a757ea
waiting for AAD role to propagate.done
Provisioning compute resources...
BadRequestFormat: The request format was invalid. Details: Updating clusters with cluster type Local is not supported

I got the same error when I ran the command again after it had already run successfully once. Considering that your output says, "Resource group gopenvrg already exists", it looks like that might be true for you too.
If you change "-n gopenv" to something else like "-n gopenv2", it might work for you. It did for me.
You can see if you already have any deployment environments by running "az ml env list".

Related

AKS Cluster deployment fails with "ReconcileMSICredentialError"

When I try to deploy a fresh AKS cluster with "Dev/Test" Settings via the Portal, I get the following error while deployment:
{"code":"DeploymentFailed","message":"At least one resource deployment operation failed.
Please list deployment operations for details. Please see
https://aka.ms/DeployOperations for usage details.","details":
[{"code":"ReconcileMSICredentialError","message":"Reconcile MSI credential failed.
Details: autorest/azure: Service returned an error. Status=409 Code=\"Conflict\"
Message=\"Secret bf905bf9e9ad86526b26e98d2ea490a0a500ff23907f9df987d95de3a649e751 is
currently being deleted and cannot be re-created; retry later.\" InnerError=
{\"code\":\"ObjectIsBeingDeleted\"}."}]}
However, the resource still gets deployed, but with a notification that "the resource is in a failed state". When I stop the cluster and start it new, the notification disappears but I'm not sure if the error remains.
I can avoid the error altogether, if I pick a new name for the cluster. However, I'd like to keep the old name.
The same happens when I deploy with different settings (CPU, number of nodes, etc.). I also tried deleting the cluster entirely and deploying it completely new but the error persists. I haven't found any explanation to this error either on Stackoverflow or Google.
What could be the reason for this error and how to avoid it?
I tried to reproduce the same issue in my environment and got the below results
I have created the AKS cluster with dev/test environment
The reference cluster is successfully created
I have given the some credentials to the cluster using below command
az aks get-credentials --resource-group Alldemorg --name cluster_name
*Created the sample application and deployed that application into the cluster,
I have used the following Reference for example sample file.*
Deployment got succeeded and I am able to see all the pods and nodes which got created for the application
Note:
1). "ReconcileMSICredentialError" error we are getting because of the version please check the version and upgrade to latest
2). If the resource is in failed stated delete the entire resource instead of deleting cluster and create it again if we stop and start the resource may chance of getting "ReconcileMSICredentialError".

AZ CLI login using Service Principal fails from specific computer

I have posted previously about az login using a Service Principal failing with the error No subscriptions found and I have run across others that have had similar issues. Capability seems shaky for some reason. What I am seeing now that has me scratching my head is when I run a script I have that does an az login with a service principal from my desktop computer it works fine...no issues. When I run the same script from my laptop, the login fails with the No subscriptions found error. What I have tried on the laptop:
Checked AZ CLI version...same as desktop
Ran az account clear to make sure everything was cleared out
Deleted Service Principal from AAD and recreated from laptop
I even ran az account clear on my desktop to make sure it was not working simply because it was cached and even after the clear, the az login worked fine.
Any thoughts on what might be causing this?
You need to assign a Role to the Service Principle, or add the flag --allow-no-subscriptions.
I have posted resolution for this under following thread.
https://stackoverflow.com/a/66108965/1712969
you might want to try command with this flag "--allow-no-subscriptions"

Azure function app deployment through intellij

I'm trying to deploy azure function app from intelliJ, getting following error when run mvn azure-functions:deploy
"The specified function app does not exist. Creating a new function app..."
after above getting status code : 400
Not sure why 400 since there is nothing printed or returned.
In pom, check functionResourceGroup of properties, resource group has some naming restrictions
Alphanumeric, underscore, parentheses, hyphen, period (except at end), and Unicode characters that match the allowed characters.
Maven Plugin for Azure Functions seems not to provide concrete error message as it does for functionAppName.
In my case I had multiple subscriptions attached to my account, since I was not setting any subscription upon login, azure was using default subscription and was not finding the function app.
I set the subscription in which function app was created then it worked properly.
az login -u <> -p <>
and then set the subscription
az account set
after above run the azure deploy. it worked for me.

'Microsoft.Compute' Azure Resource Provider never finishes 'Registering':

This issue is SOMEWHAT similar to the following (but does not display the same behavior in testing):
Azure: Microsoft.Compute resource provider stuck 'Registering' for about a day
While attempting to register the Microsoft.Compute provider in order
to use AKS from the MacOS command line (or alternately Cloud Shell
from within Azure), the provider takes an exorbitantly long time to
finish registering.
At the time of this writing it is still in the 'registering' status 6 hours later.
What is supposed to happen normally?
The following command should register Microsoft.Compute in a timely manner:
az provider register -n Microsoft.Compute
What goes wrong?
The provider never registers and continuously hangs "Registering" when it's status is checked with the following command:
az provider show --namespace Microsoft.Compute -o table
The response to the above shell command looks like this (and has for 6 hours or so):
Namespace RegistrationState
----------------- -------------------
Microsoft.Compute Registering
Other Providers that Registered Successfully:
Microsoft.Network
Microsoft.Storage
Microsoft.ContainerService
Given the above (and the fact that I am the root / only user) it is likely not a permissions issue since those providers would have failed. Under 'My Permissions' I am listed as administrator: 'You are an administrator on the subscription' see the below screen shot of registered providers (with permissions blade on the right):
Since the provider never finishes registering, attempting to create a Cluster with AKS of course fails with:
Operation failed with status: 'Bad Request'. Details: Required resource provider registrations Microsoft.Compute are missing.
Other Background:
Two days ago I preformed the identical operations on a clients account successfully and everything finished within 5 minutes. I have tried the following options to solve the issue (thus far with no impact):
A user on Stack Overflow (here: Azure: Microsoft.Compute resource provider stuck 'Registering' for about a day) suggested spinning a VM within the resource group / account in the hopes that would solve the issue and register the provider automatically (this did not work).
Un-register the Provider
This is where my situation diverges from the above question. When unregistering the component with:
az provider unregister -n Microsoft.Compute
I get:
The subscription cannot be unregistered from resource namespace 'Microsoft.Compute'. Please delete existing resources for the provider.
This is different from the user over here who then gets Stuck / Hangs at Un-Registering, as opposed to failing with the above message when attempting to unregister (Azure: Microsoft.Compute resource provider stuck 'Registering' for about a day)
I am hoping that someone has encountered this in the past.
At this point I am going to try to delete my subscription (which was created earlier in the day) and add a new one / repeat.
Will post back with my findings.
This specific issue was faced by multiple users across the globe and has been fixed, try and see if it works fine now.

Azure Service Fabric ARM template Provisioning Failed

I have a script that facilitates an ARM template to provision an Azure Service Fabric cluster (official windows servers) among other dependencies like storage and such. I do not provision through the portal.
Facts:
Two days ago, I used this script to provision the cluster with complete success.
I tried the same again yesterday, and the provisioning failed (with the error below).
just to reassure you that the provisioning script works, I can successfully provision with this script on other subscription and it constantly and reliably succeeds.
The error:
Resource Microsoft.Insights/autoscaleSettings '1NodeVMSetAutoScale' failed with message 'The metric with namespace '' and name '\Processor(_Total)\% Processor Time' is not supported for this resource id '/subscriptions/----/resourceGroups/-cluster/providers/Microsoft.Compute/virtualMachineScaleSets/1'.' 8:10:01 PM - Resource Microsoft.Insights/autoscaleSettings '2NodeVMSetAutoScale' failed with message 'The metric with namespace '' and name '\Processor(_Total)\% Processor Time' is not supported for this resource id '/subscriptions/----/resourceGroups/cluster/providers/Microsoft.Compute/virtualMachineScaleSets/2'.' 8:10:01 PM - "Template output evaluation skipped: at least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details." 'string' does not contain a definition for 'error'
My question is why? What could be the reason for it not to consistently succeed? Can you please help with troubleshooting steps?
Related info: https://azure.microsoft.com/en-us/documentation/articles/insights-autoscale-common-metrics/
2 questions:
1) what region are you deploying in?
2) In the new subscription, can you check what resource providers you have registered, and in what regions? In the CLI, the commands look like:
azure config mode arm
azure provider list
azure provider show Microsoft.Insights
I faced the same issue since a week in my subscriptions. The way out was to make changes to the Diagnostic configurations, by adding the counter "\Processor(_Total)\% Processor Time" under the waddiagnostic performace counters section. You can also take sneak peak here were autoscale is discussed: Service Fabric Autoscale
Please share your template/ part of it to analyse further.

Resources