Unable to Create Storage pool on Azure VM 2016 - azure

I have created 6 disks of 256GB each on 2 windows server 2016 VMs. I need to implement Active-Active SQL failover cluster on these 2 VMs using S2D.
I am getting error while creating storage pool for 3 disks , below is the error
Cluster resource 'Cluster Pool 1' of type 'Storage Pool' in clustered role xxxxxx failed. The error code was '0x16' ('The device does not recognize the command.').
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet
[Problem start date and time]

S2D is new in Windows Server 2016. You can check what to have before you process with building your failover cluster. It's strongly to validate cluster first and then enable S2D following Configure the Windows Failover Cluster with S2D.

this error is appearing because i tried to create storage pool again..basically, enable-s2dcluster has created the pool already for me..i didnt notice it and was trying to create the pool using Failove cluster manager

In order to achieve an active-active solution, you should configure a host/VM per location. For Azure, S2D does not work between two locations. It requires RDMA support for the performance that cannot be configured in Azure. So, to get HA for SQL FCI to check StarWind vSAN free that can be configured between sites replicating/mirroring storage. https://www.starwindsoftware.com/resource-library/installing-and-configuring-a-sql-server-failover-clustered-instance-on-microsoft-azure-virtual-machines
I see the following configuration: Storage Spaces provides disk redundancy configuring mirror or parity for each VM and StarWind distributes HA storage on top of underlying Storage Spaces.

Related

Azure Databricks Execution Fail - CLOUD_PROVIDER_LAUNCH_FAILURE

I'm using Azure DataFactory for my data ingestion and using an Azure Databricks notebook through ADF's Notebook activity.
The Notebook uses an existing instance pool of Standard DS3_V2 (2-5 nodes autoscaled) with 7.3LTS Spark Runtime version. The same Azure subscription is used by multiple teams for their respective data pipelines.
During the ADF pipeline execution, I'm facing a notebook activity failure frequently with the below error message
{
"reason": {
"code": "CLOUD_PROVIDER_LAUNCH_FAILURE",
"type": "CLOUD_FAILURE",
"parameters": {
"azure_error_code": "SubnetIsFull",
"azure_error_message": "Subnet /subscriptions/<Subscription>/resourceGroups/<RG>/providers/Microsoft.Network/virtualNetworks/<VN>/subnets/<subnet> with address prefix 10.237.35.128/26 does not have enough capacity for 2 IP addresses."
}
}
}
Can anyone explain what this error is and how I can reduce the occurrence of this? (The documents I found are not explanatory)
Looks like your data bricks has been created within a VNET see this link or this link. When this is done, the databricks instances are created within one of the subnets within this VNET. It seems that at the point of triggering, all the IPs within the subnet were already utilized.
You cannot ad should not extend the IP space. Please do not attempt to change the existing VNET configuration as this will affect your databricks cluster.
You have the following options.
Check when less number of databricks instances are being instantiated and
schedule your ADF during this time. You should be looking at
distributing the execution across the time so we don't attempt to
peak over the existing IPs in the subnet.
Request your IT department to create a new VNET and subnet and
create a new Databricks cluster in this VNET.
The problem arise from the fact that when your workspace was created, the network and subnet sizes wasn't planned correctly (see docs). As result, when you're trying to launch a cluster, then there is not enough IP addresses in a given subnet, and given this error.
Unfortunately right now it's not possible to expand network/subnets size, so if you need a bigger network, then you need to deploy a new workspace and migrate into it.

Azure - Enable Backup on VM with Windows Server 2019 Core server, D4s_v3 sku, is failing with code BMSUserErrorContainerObjectNotFound

Azure VM Details :
OS : Windows Server 2019 Datacenter Core
Size: Standard D4s v3 (4 vcpus, 16 GiB memory)
Location: Australia East
VM generation: V1
Agent status: Ready
Agent version: 2.7.41491.1010
Azure disk encryption: Not Enabled
Extensions already installed :
DependencyAgentWindows
IaaSAntimalware
MDE.Windows
MicrosoftMonitoringAgent
Have an existing recovery services vault with 10s of other VMs getting backed up.
Trying to enable the backup from Azure Portal for this VM ( From the VM Blade > Operations > Backup ) but it's failing with the following error code:
I have tried it multiple times.
Provisioning state: Failed
Duration: 1 minute 3 seconds
Status: Conflict
{
"code": "DeploymentFailed",
"message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.",
"details": [
{
"code": "BMSUserErrorContainerObjectNotFound",
"message": "Item not found"
}
]
}
All the information on troubleshooting backup relates issues # https://learn.microsoft.com/en-us/azure/backup/backup-azure-vms-troubleshoot talk about After the "Enable Backup" step.
I have also tried to enable the backup using azure cli:
az backup protection enable-for-vm --vm "/subscriptions/xxx/resourceGroups/yyy/providers/Microsoft.Compute/virtualMachines/vm_name" -v vaultname -g vault_resourcegroup -p backuppolicy_name
It throws the following error:
The specified Azure Virtual Machine Not Found. Possible causes are
1. VM does not exist
2. The VM name or the Service name needs to be case sensitive
3. VM is already Protected with same or other Vault.
Please Unprotect VM first and then try to protect it again.
Please contact Microsoft for further assistance.
None of the Point 1,2 or 3 are true.
VM exists, the name is used as shown in the portal, no other VM protection service is in use.
Note: I have faced this issue a few days back on another subscription, but luckily no one was yet using that VM, so I destroyed and re-deployed the VM, and the error went away.
I can't do the same for this VM as it's already in use.
Any help/guidance will be appreciated.
Seems like a portal error or the VM is not able to communicate with Azure Platform. I would suggest you try the "Reapply" feature to update the platform status.
[Snippet of Reapply in Azure Porta][1]
Else, you can try initiating a backup from the "Recovery Services vaults" blade and add the VM to it.
The solution was to contact Microsoft support. Their engineer after some analysis ( aka to and fro, screenshots exchange over email..etc) replied with:
I check from the backend and notice that the VM status is not in synchronize state. I’ve requested the VM engineer xxxxx resync the VM from the backend. Please try to reenable the VM backup again in the Azure portal recovery service Vault page. If you encounter the same issue, please try to configure the VM backup in the Azure Virtual Machine Panel page and let me know the results. Thanks!
After this when I attempted to enable the backup it worked.
So for anyone who faces this problem, it looks like the only option is to get in touch with MS Support.

Azure Neo4J Will Not Deploy

I'm trying to deploy a neo4j Enterprise Cluster using the Azure Portal GUI. I'm just doing a vanilla install. When I get to the last step, the error reads:
InvalidContentLink
Unable to download deployment content from 'https://gallery.azure.com/artifact/20151001/neo4j.neo4j-enterprise-editionha.1.0.10/Artifacts/clusterTemplate.json'. The tracking Id is '99f19bbe-f9f8-4e04-91b7-7aa58a82922f'. Please see https://aka.ms/arm-deploy for usage details.
Basics
Subscription
Not free trial
Resource group
neo4j
Location
(US) West US 2
Admin Account Name
reallyHardToGuess
Password
****************
Neo4j Settings
Neo4j Version
Neo4j 3.1
Neo4j password
****************
SSL Certificate
-
SSL Private Key
-
Neo Cluster Name
neo
Number of VMs
3
Size of each VM
Standard D4 v2
Virtual network for the Cluster
neo-vnet-01
Subnet for Cluster VMs
clusterSubnet
Subnet for Cluster VMs address prefix
10.0.0.0/24
Public IP address
NeoIP001
None
-
The URL for the deployment content does not resolve:
I've tried all versions of neo4j and a bunch of different VM choices. Same result. Please advise.
EDIT: 2 weeks later...
The answer ended up being that Azure was showing an outdated Neo4J option ("Enterprise Edition"). One must select "Casaul Cluster" to pass validation:
The Azure instance of "Enterprise Edition VM" is currently invalid. Only option is to choose Causal Cluster or (potentially) deploy Enterprise Edition VM via this method on Medium (https://medium.com/neo4j/how-to-automate-neo4j-deploys-on-azure-d1eaeb15b70a)
You are using a very old template that is no longer supported. The correct one is here: https://azuremarketplace.microsoft.com/en-us/marketplace/apps/neo4j.neo4j-enterprise-causal-cluster?tab=Overview
Notice that you are attempting to launch Neo4j version 3.1, which is probably about 2 years old. The current stable version is 3.5 with 4.0 soon available.
Use the updated version.

Unable to connect AKS cluster: connection time out

I've created an AKS cluster in the UK region in Azure.
Currently, I can no longer access my AKS cluster. Connecting to the public IPs fails; all connections time out.
Furthermore, I can't run the kubectl command either:
fcarlier#ubuntu:~$ kubectl get nodes
Unable to connect to the server: net/http: TLS handshake timeout
Is there a known issue with AKS in that region or is it something on my side?
Is there a known issue with AKS in that region or is it something on
my side?
Sorry to give you a bad experience.
For now, Azure AKS still in preview, please try to recreate it, ukwest works fine now.
Here is a similar case about you, please refer to it.
I just successfully created a single node AKS cluster on UK West with no issues. Can you please retest? For now, I would avoid provisioning on West US 2 until the threshold issues are fixed. I'm aware the AKS team is actively engaged to restore service on West US. Sorry for the inconvenience. Below is the sample cmd to create in UK if you need the reference. Hope this helps.
Create Resource Group (UK West):az group create --name myResourceGroupUK --location ukwest
Create AKS cluster in (UK west):az aks create --resource-group myResourceGroupUK --name myK8sClusterUK --agent-count 1 --generate-ssh-keys
I just finished a big post over here on this topic (which is not as straight forward as a single solution / workaround): 'Unable to connect Net/http: TLS handshake timeout' — Why can't Kubectl connect to Azure Kubernetes server? (AKS)
That being said, the solution to this one for me was to scale the nodes up — and then back down — for my impacted Cluster from the Azure Kubernetes service blade web console.
Workaround / Potential Solution
Log into the Azure Console — Kubernetes Service blade.
Scale your cluster up by 1 node.
Wait for scale to complete and attempt to connect (you should be able to).
Scale your cluster back down to the normal size to avoid cost increases.
Total time it took me ~2 mins.
More Background Info on the Issue
Also added this solution to the full ticket description write up that I posted over here (if you want more info have a read):
'Unable to connect Net/http: TLS handshake timeout' — Why can't Kubectl connect to Azure Kubernetes server? (AKS)

Azure Resource Manager: move VM to availability group

Can't seem to figure out how to change the availability set of an existing Azure VM in the Resource Manager stack. There's no interface for it. Set-AzureAvailabilitySet does not exist in the Azure Powershell tools when in ResourceManager mode. It does exist in service stack mode. But that doesn't help me.
AFAIK, this feature may be addressed by the end of this year. It's a big challenge for the MS team to allow such operation. Changing the availability Set requires a review of the VM mobility architecture on Azure. Fore example, adding a VM in an Availability Set already containing a VM means putting it to different default domain. Becasue VM mobilty is a matter on Azure (No Live Migration), it's not an easy operation.
I have written a Powershell script which let you change the AS of an ARM VM by recreating it.
Give it a try and enjoy:
How to use it ?
1- Download the script and save it to local location
2- Run it and provide the requested parameters
or
2- ./Set-ArmVmAvailabilitySet.ps1 –VmName ‘The VM Name’ –ResourceGroup
‘Resource Group’ –AvailabilitySetName ‘As Name’ –SubscriptionName
‘The Subscription name’
To remove a VM from an AvailabilitySet:
./Set-ArmVmAvailabilitySet.ps1 –VmName ‘The VM Name’ –ResourceGroup
‘Resource Group’ –AvailabilitySetName 0 –SubscriptionName ‘The
Subscription name’
Download Link
Version 1.01 :
https://gallery.technet.microsoft.com/Set-Azure-Resource-Manager-f7509ec4
Source
That feature isn't implemented yet in the ARM stack, that's why you're not seeing the cmdlet...

Resources