This morning our Azure VM running Server 2008 and SQL Server 2008 went offline, and took about 1 hour to restart. The event log shows only:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
and the Azure dashboard says there were no availability incidents during this time (US - West)
Is it safe to believe the Azure dashboard, which implies that Windows crashed? We don't have Azure support (yet).
When this is a production environment I highly recommend you to configure Availability Sets. This document can guide you: http://azure.microsoft.com/en-us/documentation/articles/virtual-machines-manage-availability/
Back to your question on trusting the Azure Service Dashboard:
You have one Virtual Machine running on a host in the Azure data center. Now assume this one host would fail, I doubt this will trigger a performance degradation or service interruption alert.
Maybe someone with more background about how the Service Dashboard works can confirm!
However, I do find it strange that it took 1 hour to restart. Could this be SQL Server recovering the database(s)? This should be visible in the Event log.
Related
I have a P1v2 type appservice plan instance.
It runs 3-4 nodejs application services in a containerized environment.
These application services have been completely error free for a whole year, the metrics don't show anything that would overload them, there are no application side errors either.
The appservice plan metrics are also fine.
But lately, once or twice a week I get flooded with alerts because the appservice plan restarts, so all 3-4 appservices too.
And the container startup time is around 3-4 minutes.
And in the meantime the CPU usage increases, plus there are other alerts.
Anyone have any idea what could be the reason for the restarts?
In service health , you will be get notified if there are any planned maintenance or any infrastructure update activities .
To know more information about the app service plan restart i would suggest to open discussion over Microsoft Q&A or to file a support ticket wherein technical support team would help you in troubleshooting the issue from platform end.
You refer the below blog where you can see the sample toasted email notification from Microsoft for a planned maintenance activity.
Creating the topic cause I didn't find any info on this issue, followed strictly the Microsoft documentation below but it didn't work.
I need to migrate a couple of Hyper-V VMs to Azure for a Lab that I want to run some study and tests.
I tried to run the procedure described on the doc:
https://learn.microsoft.com/en-us/azure/migrate/tutorial-migrate-hyper-v
I proceed with all the steps with no problems, until the migration itself.
On the first time I ran the Azure Site Recovery Configurator, the process went ok with no errors.
It discovered the machines, but I did not start the replication. At that moment I needed to turn off the computer (Windows Server 2016, Hyper-V Host), so I would go back later to start the migration of the VMs.
On another day, when I start the replication of the VMs it got stuck on 1% and not proceeding.
After several hours stuck, I noticed that the VM Host on Infrastructure Servers page was in a "Not Connected" connection status.
According to the initial page of the application (webpage running on port 44368): Discovery agent, Assessment agent and management app are running.
I ran the PowerShell script that would prepare the host to make the migration (Enable Powershell, open WinRM ports, etc).
It's a home lab so I have no firewall and stuff, just a simple router.
On services, every Microsoft service is running except for RecoveryServicesManagementAgent, which stops immediately after I started it.
I tried to register again with Azure Site Recovery Configurator on the Hyper-V Host and I got the following error:
Registration was successful but setup failed to start Microsoft Azure Site Recovery Service on this machine. Please try starting the service manually.
I didn't find any info on this service and error, and it's not on services.MSC console as well.
I noticed that the server goes to a "Connected" state on Infrastructure Servers page for a while, but it stops again and returns to "Not Connected".
Also, I tried to stop the replication task on the single VM I tried to replicate and it's now stuck on "Disabling Protection" as well, probably because it cant reach the server as there is something wrong with the configuration of services running.
Another problem that I noticed is that my monthly credits are slowly decreasing (Free Trial) after I started this whole process.
Did anyone with a better understanding of this procedure can help me migrate the Hyper-V VMs o Azure or at least point to some direction?
The query needs a further deeper dive technically.
I recommend you to create a technical support ticket. The ticket enables you to work closely with the support engineers and get a quick resolution to your issue.
Here is the link https://learn.microsoft.com/en-in/azure/azure-supportability/how-to-create-azure-support-request to create support case.
please open a Support Ticket so the team can engage directly with you to resolve the issue.
I remember reading that the DNN platform edition struggles with the MS Azure web app environment in regards to the scheduler tasks.
this quote comes for this DNN connect blog
The DNN Platform / Community scheduler does not support Azure Web
Sites as the server names running the web site are ever changing as
Azure scales up and down, or upgrades underlying machines. There is a
solution for Evoq.
I am getting the following scheduler errors in my DNN instances on Azure:
THREAD ID:59
TYPE:DotNetNuke.Services.Search.SearchEngineScheduler,DOTNETNUKE
EXCEPTION:Lock obtain timed out:
NativeFSLock#D:\home\site\wwwroot\App_Data\Search\write.lock
It is happening A LOT. THis is on a 08.00.04 version
I am also getting the following error on a 08.00.01 version
TYPE:DotNetNuke.Services.Scheduling.PurgeScheduleHistory, DOTNETNUKE
EXCEPTION:Execution Timeout Expired. The timeout period elapsed prior
to completion of the operation or the server is not responding.
My question is whether dnn platform's scheduler works properly in the azure web app environment, and what version saw this delivered. This will help me in getting these errors resolved.
Thanks
We often host Test and QA Evoq environments in Azure and the machine names do change, therefore require us to reactivate licensing. But these are on the Free or Shared pricing tiers. I believe if you move to Basic and above, the environment should be dedicated. You can also confirm with DNN Corp on this because their own OnDemand hosting uses Azure.
The Search write-lock error I have seen on other environments so I don't believe it's an Azure problem necessarily. For this issue, restart the app pool and delete all files in the App_Data\Search folder. Then start the site, go to Settings > Site Settings > Search and click the Re-index Content button. Then start the Site Crawler job from the scheduler. Ensure in Site Settings > Servers > Server Settings > Web Servers, that only the current server name is there -- delete any old server names. In Settings > Scheduler, edit the Search: Site Crawler task. In the Servers textbox, you can enter the name of the server so that it ensures it only runs on that server to ensure no overlap in processing (thinking it's in webfarm mode).
I am running a .NET Core web application on an Azure App Service (App Service plan is configured to use S1). It is stable.
However, I recently ran an automated test against production and it caused 100s of errors in a few minutes. After this, the App Service became unavailable for a long time.
I know that App Service basically uses IIS and I know that there is a setting in IIS that will shut down an App Service on too many errors in a short time. I am assuming that this is the setting that came into effect for my app.
My question is: How do I prevent Azure from shutting down my App Service, even if many errors happen in a short time?
Investigate the "Always On" setting that can be changed in the Azure Portal under Application settings, General Settings. This value is configured per App.
The UI control will be disabled if your price tier does not support always on. Typically these lower priced levels in the pricing tiers are not used for a production site.
I recently ran an automated test against production and it caused 100s of errors in a few minutes. After this, the App Service became unavailable for a long time.
Firstly, you can enable diagnostic functionality for App Service web app to log information from both the web server and the web application, which will help you troubleshoot the issue.
Secondly, you can try to increase the number of instances that run your app and check if it can mitigate the issue.
Besides, if possible, you can set up staging environment and do automated test on staging environment instead of production environment, which will not cause your production shutting down for long time when you do automated test on staging.
I am not sure whether this problem was correctly diagnosed back in 2017 when I was using a .NET Core WebApp. Maybe it was or maybe it wasn't.
However, I have today in late 2019 on Azure Functions V2 and .NET Core 2.2 recreated the same scenario and provoked 5000 unhandled exceptions in one minute and the Function did not go down because of that.
So anyone finding this question can pretty much rest assured, if they are on Azure Functions V2 or newer - it does not crash just because of the quantity of exceptions like it was the case with default settings in IIS in the past.
I have a deployment slot on Azure. During debug of my debug slot it suddenly stopped working. I therefore stopped and started it again. Now I am not able to access it on the azure portal:
If I tried to navigate to the service from VS I get:
Starting and stopping is not working. And I cannot find anybody with this issue. I can't even delete it to recreate it, since the deployment slots has completely disappeared :/
Anybody have an idea of how to tackle this, e.g. through azure command prompt or powershell?
Error Message in Azure
When I try to publish I get the following:
Web deployment task failed. (Could not connect to the remote computer ("AppServie-development.scm.azurewebsites.net") using the specified process ("Web Management Service") because the server did not respond. Make sure that the process ("Web Management Service") is started on the remote computer. Learn more at: http://go.microsoft.com/fwlink/?LinkId=221672#ERROR_COULD_NOT_CONNECT_TO_REMOTESVC.) AppService C:\Program Files (x86)\MSBuild\Microsoft\VisualStudio\v14.0\Web\Microsoft.Web.Publishing.targets 4283
happened to all our services including DB connections just a short moment ago. Now waiting for resolution from their side....
https://azure.microsoft.com/en-us/status/
it only mentioned global DNS issue, but I guess that means all services?... anyway, I'm sitting duck here waiting my boss to shout at me
14:03
All our services are back online
we are not paying the support subscription so unofficially we are their second class customers...?
but anyway problem solved, a typical 1 hour experience as we have with Microsoft in most technical issues.... (personal opinion - not bad overall when running in a cloud)
All our services went down as well due to connection issues with DBs
I am also following on Twitter on "#azure". Seems to be a problem that hits a lot of people right now. Also many of the bigger websites in North Europe seems to be super slow / down.
Seems to be only in North/West Europe.
News from : https://azure.microsoft.com/en-us/status/
DNS - Multi-Region
Last updated 5 minutes ago
Starting at 11:48 UTC 15 Sep, 2016 a subset of customers using DNS in multiple regions may experience difficulties connecting to their resources hosted in this region. This issue is also having knock-on impact on impact on multiple Azure services, including SQL Database, Virtual Machines, Visual Studio Team Services, and App Service \ Web Apps. Engineers are aware of this issue and are actively investigating. The next update will be provided in 60 minutes, or as events warrant.