Azure VM scale sets not accessible and cannot restart - azure

Today, I found I cannot remote into my Azure VM scale set instances (win 2016 Nano Server). I then tried restart the VM scale set instance using powershell but got following like error:
Restart-AzureRmVmss : Long running operation failed with status 'Failed'. Additional Info:'VM 'master-vmss_0' has not
reported status for VM agent or extensions. Please verify the VM has a running VM agent, and can establish outbound
connections to Azure storage.'
ErrorCode: VMAgentStatusCommunicationError
ErrorMessage: VM 'master-vmss_0' has not reported status for VM agent or extensions. Please verify the VM has a
running VM agent, and can establish outbound connections to Azure storage.
Our VM scale sets has been running correctly for nearly one year. What happened on VMSS? Did Azure make changes on VMSS recently?
Update with NSG outbound rule:
{
"name": "AllowVnetOutBound",
"properties": {
"provisioningState": "Succeeded",
"description": "Allow outbound traffic from all VMs to all VMs in VNET",
"access": "Allow",
"priority": 65000,
"direction": "Outbound",
}
},
{
"name": "AllowInternetOutBound",
"properties": {
"provisioningState": "Succeeded",
"description": "Allow outbound traffic from all VMs to Internet",
"access": "Allow",
"priority": 65001,
"direction": "Outbound",
}
},
{
"name": "DenyAllOutBound",
"etag": "W/\"a8e5e396-4f92-4118-b8ea-9b7d0111079f\"",
"properties": {
"provisioningState": "Succeeded",
"description": "Deny all outbound traffic",
"access": "Deny",
"priority": 65500,
"direction": "Outbound",
}
}

Check Network Security Group is blocking outbound connectivity from your VM?
VM Agent and Extensions are updated
If HTTPS is blocked to this storage account then you will get this error message. More information here http://www.deployazure.com/compute/virtual-machines/azure-vm-agent-extensions-deep-dive-part-3/

Our VM scale sets has been running correctly for nearly one year. What
happened on VMSS? Did Azure make changes on VMSS recently?
Recently, there are some maintenance in Azure, update host OS to windows server 2016. We can find the maintenance information in Azure portal, like this:
In your scenario, maybe we can create a new VM in VMSS' vnet, then use that VM to remote the VMSS instance, to check the VM agent status.
If the VM agent status is down, we should start it.
Update:
Recently you experienced an issue with starting VM failure after planned maintenance VM reboot. This is due to container fault issue in the backend.
Our backend engineer has checked the faulted tenant and applied mitigation. Can you try to restart the VMss again? If that does not work, please create a support ticket to Azure.
Again, sorry for all the inconvenience caused for you.

Related

Azure Application Gateway Error with Unhealthy Status 502 Error

Azure Application Gateway On Overview dashboard shows error -
All the instances in one or more of your backend pools are unhealthy. This will result in a 502 error when you try to access your application hosted behind the Application Gateway. Please check the backend health and resolve the issue.
As anyone came across this error and know how to resolve this?
I have already rebooted the service using Azure-CLI with Command
:~$ az network application-gateway stop -n <ap-gw> -g <rs-gp>
:~$ az network application-gateway start -n <ap-gw> -g <rs-gp>
However the problem persists..
When I do check application Gateway health status.
$ az network application-gateway show-backend-health --resource-group <rs-gp> --name <application-gateway>
{
"backendAddressPools": [
{
"backendAddressPool": {
"backendAddresses": null,
"backendIpConfigurations": null,
"etag": null,
"id": "/subscriptions/subscriptionsID/resourceGroups/<application-gateway>/providers/Microsoft.Network/applicationGateways/<application-gateway>/backendAddressPools/dev-silverlight",
"name": null,
"provisioningState": null,
"resourceGroup": "<application-gateway>",
"type": null
},
"backendHttpSettingsCollection": [
{
"backendHttpSettings": {
"affinityCookieName": null,
"authenticationCertificates": null,
"connectionDraining": null,
"cookieBasedAffinity": null,
"etag": null,
"hostName": null,
"id": "/subscriptions/subscriptionsID/resourceGroups/<application-gateway>/providers/Microsoft.Network/applicationGateways/<application-gateway>/backendHttpSettingsCollection/HTTPS-Resource",
"name": null,
"path": null,
"pickHostNameFromBackendAddress": null,
"port": null,
"probe": null,
"probeEnabled": null,
"protocol": null,
"provisioningState": null,
"requestTimeout": null,
"resourceGroup": "<application-gateway>",
"trustedRootCertificates": null,
"type": null
},
"servers": [
{
"address": "pvIP",
"health": "Unhealthy",
"healthProbeLog": "Cannot connect to backend server. Check whether any NSG/UDR/Firewall is blocking access to the server. Check if application is running on correct port. To learn more visit - https://aka.ms/servernotreachable.",
"ipConfiguration": null
}
]
}
]
}
]
}
It clearly shows ( "health": "Unhealthy") ..
All feedback is highly appreciated :-)
The healthProbeLog field is where you should start this investigation. It says:
"healthProbeLog": "Cannot connect to backend server. Check whether any NSG/UDR/Firewall is blocking access to the server. Check if application is running on correct port. To learn more visit - https://aka.ms/servernotreachable."
The Application Gateway sends health probes to the backend instances. An AppGW can be "unhealthy" and throw 502s for any issues that these health probes encounter. Issues can be things such as the backend has a bad certificate, SSL handshake failing for probes, or, as is your case, the AppGW is not getting any response at all for the health probes.
Your error means the AppGW is sending health probes to the backend but it isn't responding at all. This could have many root causes
AppGW does not have a good network route to the backend. Check routing
Backend has a firewall in front of it that is dropping health probe traffic
Backend application is dead/stopped so there is nothing listening that can reply to the probe
Many more causes
I'd recommend following this Microsoft troubleshooting guide for AppGW 502s to get started.

Retrieve Azure load balancer NAT port for Azure VM in C#

I have an Azure Load Balancer in front of a Azure VM Scale Set (VMSS). I also have a NAT pool configured on the Load Balancer like the following:
{
"name": "InstanceInputEndpointNatPool",
"properties": {
"backendPort": 10000,
"frontendIPConfiguration": {
"id": "[concat(resourceId('Microsoft.Network/loadBalancers', variables('loadBalancers_01_name')), '/frontendIPConfigurations/LoadBalancerIPConfig')]"
},
"frontendPortRangeStart": 10100,
"frontendPortRangeEnd": 10500,
"protocol": "Tcp"
}
}
Essentially, for each VM in the VMSS, the load balancer forwards a request received at DNSName:PORT(between 10100-10500) to one of the VMs:10000(same backend port).
Is it possible to retrieve the PORT assigned to this VM, programmatically in C#, for a program running on that VM? This would help me directly target that VM port.
You may refer the article and find the code on GitHub link.

private ip for azure container instance

The API seems to support public or private for ip address, but I can't figure out how to get that private ip address on a vnet.
"properties": {
"containers": [
],
"osType": "Linux",
"ipAddress": {
"type": "Public",
"ports": [
{
"protocol": "tcp",
"port": "[parameters('port')]"
}
]
I'm guessing it's either not documented or not possible yet. I was wondering about exposing multiple IPs, and even though the portal doesn't have it I was able to get it working from the template by just adding it there, so I'm wondering if there is a way to get the instance on a VNET for an internal IP address through the template?
Azure Container Instances currently don't have VNet integration, so it's not possible to get a private ip - we will have it by the time Azure Container Instances reaches GA. Thanks!
Looks like this feature is now in preview:
https://learn.microsoft.com/en-us/azure/container-instances/container-instances-vnet

Azure ACS - Delete Load Balancer?

I deployed a swarm ACS and a Load Balancer was auto deployed also.
I'm using an Application Gateway for SSL offloading and want to point it at my swarm agents.
However, since the swarm agents are configured as the backend pool for the Load Balancer, I can't also make the swarm agents a backend pool for the Application Gateway.
I don't need/want the Load Balancer, but I can't delete it since it has a backend pool associated with is.
This is the same story for GUI or CLI deployed ACS'.
I asked this same question over at Microsoft, but they eventually directed me here.
Thoughts?
Thanks for reading.
There are two solutions. The second solution is better since you can deploy a modern swarm mode cluster:
For ACS deployed swarm cluster, in the following order make the following modifications:
remove the loadBalancerBackendAddressPools relation ship in the VMSS object
remove the loadBalancer
remove the public ip associated with the loadBalancer.
Use ACS-Engine, https://github.com/Azure/acs-engine, to deploy a cluster without a load balancer using the model such as the following:
{
"apiVersion": "vlabs",
"properties": {
"orchestratorProfile": {
"orchestratorType": "SwarmMode"
},
"masterProfile": {
"count": 3,
"dnsPrefix": "",
"vmSize": "Standard_D2_v2"
},
"agentPoolProfiles": [
{
"name": "agentpublic",
"count": 3,
"vmSize": "Standard_D2_v2"
}
],
"linuxProfile": {
"adminUsername": "azureuser",
"ssh": {
"publicKeys": [
{
"keyData": ""
}
]
}
}
}
}

The server certificate on the destination computer

I am working on DevOps In that I am started the working Cross Browser Testing by Following bellow link
https://blogs.msdn.microsoft.com/mvpawardprogram/2017/02/14/cross-browser-automate-test/
whenever I am queue the build it will be successes only for the first time but whenever I restarted my VM(Virtual machine) and connected to it then it gives the below exception.
Can you please tell me how to resolve the above issue and
If you create a VM via the portal you do not get a prompt for the DNS.
So, VSTS when it deploys for the first time, it installs WINRM and as
part of this process installs a SSL Cert based on the address of the
server. So, if by the time you deploy and you've NOT subsequently
gone in and input the DNS of your VM in the Public IP resource, it
will use the IP address as the CN for the SSL Cert.
This is not an issue if you do not (a) shutdown your VM OR (B) use a
static IP but is an issue if you (c) do shutdown your VM OR (d) use a
dynamic IP. (c) & (d) describes our usage.
So, what we did is to automate the creation of the VM and added
dnsSettings property to the ARM template template.json and supplied
the DNS at runtime time via an inline parameter value:
{
"name": "[parameters('publicIpAddressName')]",
"type": "Microsoft.Network/publicIpAddresses",
"apiVersion": "2016-09-01",
"location": "[parameters('location')]",
"properties": {
"publicIpAllocationMethod": "[parameters('publicIpAddressType')]",
"dnsSettings" : {
"domainNameLabel" : "[parameters('virtualMachineName')]"
}
}
},
More information, you can refer to this thread: The SSL certificate contains a common name (CN) that does not match the hostname

Resources