Azure Virtual Machine Crashing every 2-3 hours - azure

We've got a classic VM on azure. All it's doing is running SQL server on it with a lot of DB's (we've got another VM which is a web server which is the web facing side which accesses the sql classic VM for data).
The problem we have that since yesterday morning we are now experiencing outages every 2-3 hours. There doesnt seem to be any reason for it. We've been working with Azure support but they seem to be still struggling to work out what the issue is. There doesnt seem to be anything in the event logs that give's us any information.
All that happens is that we receive a pingdom alert saying the box is out, we then can't remote into it as it times out and all database calls to it fail. 5 minutes later it will come back up. It doesnt seem to fully reboot or anything it just haults.
Any ideas on what this could be caused by? Or any places that we could look for better info? Or ways to patch this from happening?
The only thing that seems to be in the event logs that occurs around the same time is a DNS Client Event "Name resolution for the name [DNSName] timed out after none of the configured DNS servers responded."

Smartest or Quick Recovery:
Did you check SQL Server by connecting inside VM(internal) using localhost or 127.0.0.1/Instance name. If you can able connect SQL Server without any Issue internally and then Capture or Snapshot SQL Server VM and Create new VM using Capture VM(i.e without lose any data).
This issue may be occurred by following criteria:
Azure Network Firewall
Windows Server Update

This ended up being a fault with the node/sector that our VM was on. I fixed this by enlarging the size of our VM instance (4 core to 8 core), this forced azure to move it to another node/sector and this rectified the issue.

Related

Azure VM extremely slow connection

I created a Standard B1s Windows VM instance where I'm running OpenSSH service and using it as a SFTP server.
All works perfectly fine for about 2 hours, I can RDP to the VM nicely and SSH connection works fine.
After about 2 hours the connection to VM becomes very slow in a way that RDP takes around a minute and SSH connection times out every time.
What fixes a problem for a short time is restarting the VM or resizing it to any other tier. Then again everything works fine for about 2 hours, then problem appears again.
I'm aware that B1s is a Burst type VM but we are using it as simple SFTP server where 2-3 times a day one document will be uploaded. So no high CPU or Memory occupancy is needed. I also tried resizing it to non B-class VM, but the problem is the same. We are located in East USA and server is also located in that Azure region.
Any help is appreciated! Thanks
Try accessing the VM from somewhere else. Maybe its a network related problem? Create a VM in Europe West and execute the same operations. If this is causing no issues then I would try to dig deeper in network related topics.

Azure Server Incaccessible

One of my 10 Azure VMs running windows has suddenly became inaccessible! Azure Management Console shows the state of this VM as "running" the Dashboard shows no server activity since my last RDP logout 16 hours ago. I tried restarting the instance with no success, still inaccessible ( No RDP access, hosted application down, unsuccessful ping...).
I have changed the instance size from A5 to A6 using the management portal and everything went back to normal. Windows event viewer showed no errors except the unexpected shutdown today after my Instance size change. Nothing was logged between my RDP logout yesterday and the system startup today after changing the size.
I can't afford having the server down for 16 hours! Luckily this server was the development server.
How can I know what went wrong? Anyone faced a similar issue with Azure?
Thanks
there is no easy way to troubleshoot this without capturing it in a stuck state.
Sounds like you followed the recommended steps, i.e.:
- Check VM is running (portal/PowerShell/CLI).
- Check endpoints are valid.
- Restart VM
- Force a redeployment by changing the instance size.
To understand why it happened it would be necessary to leave it in a stuck state and open a support case to investigate.
There is work underway to make both self-service diagnosis and redeployment easier for situations like this.
Apparently nothing is wrong! After the reboot the machine was installing updates to complete the reboot. When I panicked, I have rebooted it again, stopped it, started it again and I have even changed its configuration thinking that it is dead. While in fact it was only installing updates.
Too bad that we cannot disable the automatic reboot or estimate the time it takes to complete.

Install Neo4j on Azure, cannot browse WebAdmin

I've just installed Neo4j 1.8.2 onto Azure by following this step-by-step process...
http://de.slideshare.net/neo4j/neo4j-on-azure-step-by-step-22598695
Unfortunately, when I browse to http://:7474/webadmin Fiddler says Error 10061 - No connection could be made because the target machine actively refused it.
I've followed the instructions exactly and haven't received any errors.
Any help much appreciated.
So, I think I got to the bottom of this. I think it was due to the size of compute / VM I was creating. It looks like the problem is caused when running on Extra Small instances. I created a new installation using a Small instance and everything now works :).
Try setting the server to accept connections form all hosts, and maybe use a newer Neo4j, say 1.9.4
http://docs.neo4j.org/chunked/stable/security-server.html#_secure_the_port_and_remote_client_connection_accepts
The way the VM Depot image is set up, it's pre-configured to allow all hosts to connect, and the Neo4j server will auto-start. The only thing you need to take care of, when constructing your VM, is to open an Input Endpoint, with any public port you want (preferably 7474 to stay true to Neo4j) and internal port 7474.
Note that the UI changed a bit since the how-to was published: You can specify the endpoint as the last step before creating your virtual machine. Other than that, the instructions should be the same. And... once the VM is up and running (it'll take about 5-10 minutes), you just visit http://yourservicename.cloudapp.net:7474 and you should see the web admin. Note: this is not the same as your vm name. If you named your VM something like 'neo' then you do not want http://neo:7474 or http://neo.cloudapp.net:7474. You need to use your cloud service name (you had to create a name for the service when you deployed the VM.
I've deployed that image several times in demos, and just tried again right now to make sure nothing wonky happened. Worked perfectly.

The server encountered an error while retrieving metrics - No dashboard metrics in Azure Ubuntu VM

I'm getting this error: "The server encountered an error while retrieving metrics. Retry the operation." in the dashboard and no Usage overview stats displayed after I've installed and removed a squid proxy server inside an Azure Ubuntu 12.04 server VM.
Anyone know any way to restore them?
I don't think this is related to anything that you've done, I think that MS have having some issues with metrics as I'm getting the same message on instances that I haven't changed.
If it's important for you I would log an issue with MS support.
I have exactly the same error as you. My service instances ran smoothly before this error showed up. And the service instances seem to be still running smoothly even with this error. Only that I cannot get statistics about the instances. It's kind of understandable, since Azure is only CTP now. But I really hops MS will fix this soon.

Azure RDP hangs in few seconds

Windows Azure, RDP for web/worker roles configured successfully. All works fine, I can connect to servers via RDP. I can see logon screen, desktop and so on. But after 3..10 seconds everything freezing. It's seems like disconnect. After reconnection it's all the same: I can work for 3..10 seconds. What should I do to fix it?
Solution:
This trouble was because of restarting. So before connecting via RDP try to stabilize node first :)
Does the role stay in a running state? I have RDP'ed into many instances of both Web and Worker roles and I have not seen this behavior.
Do you have any other details that you can share? Have you installed/modified anything as a Startup task that might be causing an issue? Have you tried from another client computer?
It looks like a connection problem. I suggest you contact with the MS support, give them your subscription ID and deployment ID, then the MS will go to your machine to verify deeply.

Resources