Azure roles are cycling - WaHostBootstrapper.exe crashing, faulting module ntdll.dll - azure

This is driving me crazy: We have a Windows Azure cloud service with a ASP.NET MVC 3 project. There have been many changes lately, so we have to deploy it every few days.
Sometimes (e.g. now), we're stuck on the following deployment error:
Role instances recycled for a certain amount of times during an update or upgrade operation...
manage.windowsazure.com tells us that instance 0 of our Staging deployment is non-healthy:
Recycling (Role has encountered an error and has stopped)
Instance 1 however is fine, the web services on instance 1 are working, so the cause of the error is not our code.
We RDP'ed to the instance and examined the Application Event Log, which displays the following error multiple times:
Faulting application name: WaHostBootstrapper.exe, version: 6.0.6002.18488, time stamp: 0x505cf7da
Faulting module name: ntdll.dll, version: 6.1.7601.17696, time stamp: 0x4e8147f0
Exception code: 0xc0000008
Fault offset: 0x00000000000d4995
Faulting process id: 0x970
Faulting application start time: 0x01ce474976d706d2
Faulting application path: E:\base\x64\WaHostBootstrapper.exe
Faulting module path: D:\Windows\SYSTEM32\ntdll.dll
Report Id: c26d8be8-b33c-11e2-a9be-00155d3ab8c9
When this happens, we are
Re-imaging the instance
Re-booting the instance
Deploying again
Creating a support ticket
Some hours later, everything works again. We didn't change anything in the deployment, it just works again, like it did before. At the time Microsoft Support is reacting, everything is fine again, so we cannot show them the problem.
This is so ridiculous and frustrating. We're losing days of work just because of this silly error.
Anyone else having these problems? Any ideas how we could stop that?

Check following:
1. You have latest Azure SDK and your solution is using the correct dll references. e.g. Make sure that if you have v2.0, all references use same version. If not, build afresh.
2. All references (including packages and own libraries) are enabled with "Copy Local" = True and they are included in the package bin.
3. If you are using Storage, check the connection strings and validity of keys.
4. Check whether service configuration has correct osFamily and osVersion that you are targeting.
5. If nothing helps, try intelliTrace.

for such kind of issues we found http://blogs.msdn.com/b/kwill/archive/2013/10/03/troubleshooting-scenario-7-role-recycling.aspx to be a good resource for debugging.

Related

When upgrading a WSP file using a C# code receive an error

There are several WSPs that I am trying to install, all of which were upgraded properly in SP2013 environment, but when I am trying to upgrade them in SP2016 then it is not working.
I am using a command of Solution.Upgrade to upgrade them.
Here's the error message received:
Type: Core Solution
Contains Web Application Resource: No
Contains Global Assembly: Yes
Contains Code Access Security Policy: No
Deployment Server Type: Front-end Web server
Deployment Status: Error
Deployed To: Globally deployed.
Last Operation Result: Some of the files failed to copy during deployment of the solution.
Last Operation Details: The solution has not been upgraded.
Last Operation Time: 5/17/2017 5:23 AM
Can any one tell me why?
Does it work if you retract the solution and deploy again? Try this if you have not.
How did you migrated your SP2013 sites to SP2016. Was it through powershell backup and restore command? There can be version conflict. So, try to migrate your site using database backup and restore.
Also, use Visual Studio 2015 to rebuild your solution and then deploy it. VS2015 has SP2016 templates. Take target framework 4.5 or above.
I am able to resolve this issue by seeing the logs carefully. While looking in to the logs I have found out that due to some reason Timer Service is stopped before my Upgrade code is run. So put a code to check the timer service status and if it is stopped then I started it again.
If you don't want to check for the stopped service. you can simply stop the service and start it again.
This hack has resolved my issue and now every wsp is deployed successfully.

Assembly changes detected. Restarting host

My Azure Functions were running fine and all of a sudden I am getting several "Assembly changes detected. Restarting host..." messages that is preventing my functions from completing.
I am not deploying new code so not sure what is triggering the Assembly Change event to fire. I was running on the latest version of the runtime and have since reverted to version 1.0.10947 thinking that maybe the underlying runtime was updated, but I'm still getting that line showing up in the logs.
Update
Now that #Alexey has helped me track down what is causing the Assembly changes to be detected. I would like to ask if anyone can tell me WHY an assembly change is being detected even-tough I have not changed/redeployed my application.
After looking in your logs we opened an issue https://github.com/Azure/azure-webjobs-sdk-script/issues/1533#issuecomment-303595960.
Your functions had multiple restores but now issue is gone. Restores could be initiated by changing project.json.
If you are stuck with the multiple
Assembly changes detected. Restarting host
I fixed my issue by deleted the log file in the Kudu services:
https://[FunctionAppName].scm.azurewebsites.net/
and follow on the top menu:
Debug Console >> powerShell
And the file log is :
LogFiles >> Application >> Functions >> function >> [Function name]
You can remove the log file.
my 2c.
I was struggling with this issue for ages and not sure what was causing it. I believe I may have the answer.
Our solution has been toying with consumption plans, but pulled back to full App Service Plans because the initiation times were too long for our rather unique usage patterns.
But 2 of the appsetting params were still in place: WEBSITE_CONTENTSHARE And WEBSITE_CONTENTAZUREFILECONNECTIONSTRING.
per:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-app-settings#websitecontentazurefileconnectionstring
these are ONLY for consumption plans.
I removed them and... touch wood, the issue seems to be resolved.

Loading profiler failed during CoCreateinstance - error, but I'm not using a profiler

Recently I published to my Azure Staging server (Asp.Net MVC App) and my app wouldn't come up. I checked the Event logs on the machine, and this was the error:
.NET Runtime version 4.0.30319.18033 - Loading profiler failed during
CoCreateInstance. Profiler CLSID:
'{F1260058-1A1F-4738-8BE2-0BF9D3A64219}'. HRESULT: 0x8007007e. Process
ID (decimal): 1872. Message ID: [0x2504].
The thing is that I am not using a profiler, everything worked fine yesterday (day old publish) - any ideas what could be causing this, and how I could fix it? Thank you.
Not to say there is not a better fix (I tried all I could find elsewhere, nothing seemed to relate to my specific problem) but here is what I ended up doing. Simply delete your deployment, and re-publish. This must re-set whatever turning on your profiler sets.
Remember that if this is a non domain dns instance, your address will be changed. Hope this can save someone a few hours.
Blog Post Here

Application pools won't run

I have two servers sitting behind a loadbalancer in my service tier. Both of them should be identical - IIS setup the same, AppFabric (to keep two services warmed up), app pools running under either a service account or the app pool identity. On one server, everything works. On the other server, three of my app pools (the two that AppFabric is warming up, under the service accounts, and one that's just a standard app pool with no changes made from default settings) stop running almost as soon as I start them up (sometimes on the first request).
I get five of the following error in the Application log each time I try to start one of the app pools:
There was an error during processing of the managed application service auto-start for configuration path: 'MACHINE/WEBROOT/APPHOST/Site/App'. The error message returned is: ''. The worker process will be marked unhealthy and be shutdown. The data field contains the error code.
The error code referenced is 80070005.
This is actually for the same Site/App regardless of the app pool being started (though it may change after recreating the app pools).
In the System log, I get the following warning five times before it errors (Application pool 'AppPool' is being automatically disabled due to a series of failures in the process(es) serving that application pool.):
A process serving application pool 'AppPool' reported a failure during application preloading or service loading. The process id was '2396'. Please ensure that all application preload or service settings in the application pool are configured properly. The data field contains the error number.
The error code referenced is 80004005.
The AppPool here is the one being started.
I've tried recreating; I've tried uninstalling AppFabric (but we need it, so reinstalled and still no go). I'm out of ideas. Any suggestions?
EDIT: I tried copying the applicationHost.config over from the working server, but that didn't work either..
EDIT2: One of the app pools works when running under a real user account but doesn't when running under the ApplicationPoolIdentity....
(Also, we had an issue where the site was running under 2.0 and the apps were running under 4.0. That may have resolved the ones that are running as the service accounts.)
I was just wrestling with this same problem for a few hours and found a different culprit.
I had added a new configuration section to my Web.config in a recent commit. I also added this section to a separate ERB file used by Puppet to generate a custom Web.config at the point of deployment. In this template file, I added the new section but forgot to include its declaration in <configSections>.
Once I added the declaration to the template, our app's test VMs were able to start up again and this error went away.
While the app pools for the applications were 4.0, the app pool for the site itself was 2.0, causing some of the issues. We also had inetpub on a different drive, and we had to grant access to SERVER\Users.

Windows Azure Node.js Good Worker Role Example

I have been trying to create a worker role using powershell, Azure Emulator and the azure node.js sdk however I have been running into problems when I try to start adding modules to by worker process.
These are the steps I have taken:
1) Run Powershell
2) Create a new azure node.js project
new-azureserviceproject
3) Add a webrole
add-azurenodewebrole
4) Add a worker role
add-azurenodeworkerrole
If I run the project at this stage
start-azureemulator -launch
The site runs fine and without any IIS errors. But when I start installing new modules into the worker role and try running it again I get windows IIS errors such as "Windows Azure Web Role Entry Point Has Stopped Working" without any more information as to why it stopped. Is anybody else encountered these errors and more importantly does anybody have any examples on how to create a worker role to run a cron job and talk to my windows azure table storage? All I want to do is run a cron job every 5 seconds to check table storage for any new updates and do something.
Any ideas?
Details of the error:
Problem Event Name: APPCRASH
Application Name: iisexpress.exe
Application Version: 8.0.8298.0
Application Timestamp: 4f620349
Fault Module Name: iiscore.dll
Fault Module Version: 8.0.8298.0
Fault Module Timestamp: 4f63b65c
Exception Code: c0000005
Exception Offset: 00021767
OS Version: 6.1.7601.2.1.0.256.28
Locale ID: 1033
Additional Information 1: f66d
Additional Information 2: f66d807b515d6b2dc6f28f66db769a01
Additional Information 3: 7b2f
Additional Information 4: 7b2f6797d07ebc2c23f2b227e779722e
Update, if I lower the instance count to 1 for both webrole and worker role then it doesn't crash, perhaps it's a problem with the azure emulator ?
There are several questions here, so let's start with the first. A decent sample for using a worker role that adds modules (socket.io) can be found here:
https://www.windowsazure.com/en-us/develop/nodejs/tutorials/app-using-socketio/
Next up is of course the conversation about modules on Windows. Some modules with binary dependencies don't run on Windows. That has gotten to be a pretty small number, but it is still a possibility. You should see if you can run your worker role code outside of the emulator to validate this.
Next up we should consider this process. You would typically push changes that require action into a Storage Queue from your web role and pull from the at queue in your worker role. If you have a "cron module" then pull the top item from the queue when the timer event is fired. You can always do sleeps here, but that kind of blocking is frowned on in the node world.
This may not be related but I thought I should mention it. I ran into issues because the default version of NodeJS seemed to be too old to work with the modules I was using. You may need to change the version of NodeJS. To see the list of available versions:
Get-AzureServiceProjectRoleRuntime
Then, apply a specific version (example):
Set-AzureServiceProjectRole [Role_Name] Node 0.10.21

Resources