Azure: Worker role looping through "recycling" - azure

I'm currently working on an Azure project that works 100% locally with emulator resources. I'm now trying to deploy a worker role, but I'm running into an issue that I'm not sure how to troubleshoot.
Upon deploying the worker role in my Azure portal, the two instances continually loop through "recycling".
I can try to RDP into the role, but I only have about a minute to look around before the connection closes, I'm assuming due to the recycling.
After some searching it doesn't seem like this is a super common problem. Is there something trivial I'm overlooking that could be causing this issue? How would you go about troubleshooting this? Thank you for your time :)

In case of missing Reference you can troubleshoot this issue by:
Unzip your CSPKG file and then again unzip .CSSX file (just rename CSSX to zip) and match that everything references and static content is all there.. This way you can match what is on VM. Also in 2 minute windows when you RDP, try to look for Application event log for exception and get it because that would be the key to find the root cause.
IF you could see the exception in event log and look for the exception, you sure can find where it was generated. You can also use Intellitrace which might require you to redeploy the app.
Also there are ways were copying WinDBG and locking to the specific process you can debug it. I am not sure how much you would want to try but just copy the WinDBG to VM and use it would be enough (not sure how much experience you have with WinDBG though and how much time you would want to spent.)

Also been pestered by this role recycle issue numerous times. Here is the sequence of steps to debug persistent role recycles:
Debugging Azure Role Recycles
Enable Remote Access to your role - RDP login
Check eventvwr.msc (Windows Logs -> Application, App & Service Logs->Windows Azure)
Review the Azure text file logs across both C:\logs and c:\resources
Review custom logs in the Volume E: or F: for any custom role startup logging
Run AzureTools and attach to startup processes (download WinDBG, use Utils->Attach Debugger, select process - WaWorkerHost/WaIISHost, etc), use G to continue and watch debugger output for assemblies failing to load.
Installing Azure Debugging Tools via Powershell
PS> md c:\tools; Import-Module bitstransfer; Start-BitsTransfer http://dsazure.blob.core.windows.net/azuretools/AzureTools.exe c:\tools\AzureTools.exe; c:\tools\AzureTools.exe
If all items above fail - try using other tools in the AzureTools treasure trove - such as fusion logging, etc, this approach above will work!
WinDBG Sample Output - Failing to Locate Assembly (WaIISHost)

The most likely cause is that you have a missing assembly. One tactic to catch this is to wrap any startup processing in a master try/catch that manual logs the error to Azure storage.
If you added any referrences, check to make sure they're set to copylocal=true and that any external assets that were included in your service package were also set to be included.

From Avkash above:
Yes. this mean some issue in your Worker Role code is causing your Worker Role Host Process to crash.. If you look your fault stack you must see the function or the link from your code which generate this fault. IF you need help open a free Azure Support incident to Windows Azure Support team and they will help you.

Just a suggestion: Also Check the installable(if any)and any other references you use are 64bit.Azure VMs have 64bit OS. Once i was stuck up with this kind of problem due to 32/64 bit issues.

Are your worker roles exiting their work loop? A local recycle is very fast and you might not notice it, but spin-up time in the cloud can be long.

If the issue is caused by a startup batch file, I have stopped the loop by editing the batch file on the instance to include "exit /b 0" at the beginning. This will tell Azure that the startup was successful and you then have all the time you need to diagnose issues without the VM getting killed.

Related

The service cannot accept control messages at this time

I just stopped an Application Pool in IIS. When trying to start it, IIS complains that,
The service cannot accept control messages at this time. (Exception from HRESULT: 0x80080425).
What gives? Whence did this error come?
Looking at the Event Viewer > System shows these warnings:
A worker process '1456' serving application pool 'MyAppPool' failed to stop a listener channel for protocol 'http' in the allotted time. The data field contains the error number.
A process serving application pool 'MyAppPool' suffered a fatal communication error with the Windows Process Activation Service. The process id was '10592'. The data field contains the error number.
A process serving application pool 'MyAppPool' exceeded time limits during shut down. The process id was '10516'.
This resolved itself after about 5-minutes, at which point we tried to restart the website, and received:
The World Wide Web Publish Service (W3SVC) is stopped. Web sites cannot be started unless the World Wide Web Publishing Service (W3SVC) is running.
So, we started the W3SVC service, and then we could start our website.
This helped me: just wait about a minute or two.
Wait a few minutes, then retry your operation.
Ref: https://msdn.microsoft.com/en-us/library/ms833805.aspx
The error message could result due to the following reason:
The service associated with Credential Manager does not start.
Some files associated with the application have gone corrupt.
Please follow the steps mentioned below to resolve the issue:
Method 1:
Click on the “Start”
In the text box that reads “Search Program and Files” type “Services”
Right click on “Services” and select “Run as Administrator”
In the Services Window, look for Credential Manager Service and “Stop” it.
Restart the computer and “Start” the Credential Manager Service and set it to “Automatic”.
Restart the computer and it should work fine.
Method 2:
1. Run System File Checker. Refer to the link mentioned below for additional information:
http://support.microsoft.com/kb/929833
In my case, the VS debugger was attached to the w3wp process. After detaching the debugger, I was able to restart the Application Pool
I stopped the IIS Worker Process (in task manager), and then started the IIS again.
It worked.
I killed related w3wp.exe (on a friends' advise) at task manager and it worked.
Note: Use at your own risk. Be careful picking which one to kill.
Restarting the machine worked for me but not every time.
If you are really stuck on this then follow below steps
Open Task Manager
A window will open. Click on Details tab.
Search for the process name you wanted to restart/stop.
Select process, right click on it, select End task option.
A confirmation dialog box will appear. Click on End process button.
Now try to restart your service from Services.msc window.
I forgot I had mine attached to Visual Studio debugger. Be sure to disconnect from there, and then wait a moment. Otherwise killing the process viewing the PID from the Worker Processes functionality of IIS manager will work too.
Restarting the IIS windows service (World Wide Web Publishing Service) and then starting the application pool has worked for me. However, as the top answer suggests it may have just been the waiting that caused it to subsequently work.
I had this issue recently,
Problem statement:
Mine was a windows service that I run locally by attaching VS debugger. When I stop debugging and try to restart/stop the service (under services.msc) I used to get the mentioned error.
Solution:
Open up Task manager.
Search for the service (based on the exe name and not service name, for those that are different).
Kill the service.
On doing the above the service is stopped.
Being impatient, I created a new App Pool with the same settings and used that.
I kept having this problem whenever I tried to start an app pool more than once. Rather than rebooting, I simply run the Application Information Service. (Note: This service is set to run manually on my system, which may be the reason for the problem.) From its description, it seems obvious that it is somehow involved:
Facilitates the running of interactive applications with additional administrative privileges. If this service is stopped, users will be unable to launch applications with the additional administrative privileges they may require to perform desired user tasks.
Presumably, IIS manager (as well as most other processes running as an administrator) does not maintain admin privileges throughout the life of the process, but instead request admin rights from the Application Information service on a case-by-case basis.
Source: social.technech.microsoft.com

Your role instances have recycled a number of times during an update or upgrade operation

I am trying to deploy a Cloud Service with 1 Web Role to Azure.
When I do so, I get this message:
Your role instances have recycled a number of times during an update or upgrade operation. This indicates that the new version of your service or the configuration settings you provided when configuring the service prevent the role instances from running. Verify your code does not throw unhandled exceptions and that your configuration settings are correct and then start another update or upgrade operation.
The project runs just fine locally, and I'm having a hard time figuring out how to start debugging this issue. Are there any common problems that cause this message or steps to figure out what is causing it?
See https://learn.microsoft.com/en-us/archive/blogs/kwill/windows-azure-paas-compute-diagnostics-data. This will walk through all of the diagnostic data available as well as how to troubleshoot the most common issues.
We also had this annoying problem and in our case:
We use local storage, but it wasn't defined in service definition (or Worker Role's properties)
Our worker role project has reference to a service project which has reference to data layer project. But, the worker role project doesn't have reference to the data layer project. As soon as we added reference to data layer project in worker role project, it deploys successfully.
Problem #1 can be easily noticed if you first run the project in your local machine. Exception will be thrown.
Problem #2, however, is more difficult, mainly because it runs just fine in local machine. After 5 days of trouble shooting, we finally found the problem. So, check all references and try to add sub-reference projects, those that are referenced by other references.
We had similar problem, and it was due to some DDLs failed to load. (due to different version from the one MS have deployed to the VM)
Try to set CopyLocal to "true" for all the References in the project, and re-deploy.
I would either remote desktop to the cloud instance and review the Windows Event Logs for exceptions or redeploy with IntelliTrace Enabled. If you choose the later, you can download the IntelliTrace logs from Visual Studio and debug
http://msdn.microsoft.com/en-us/library/windowsazure/ff683671.aspx
One way to find out the actual error is to click on the " 1 instance" at the top of Dashboard after trying to deploy your web role. It will tell you the status of the role instance. The status should include more information about the type of error which blocks your deployment.
It depends on what your case is. For me, the status claimed that I had an unhandled Security exception. After some investigation, it turned out that under my role's OnStart(), I tried to create a event source. However, Azure service doesn't have the permission to create an event source.
For more possible issues, check http://blogs.msdn.com/b/kwill/archive/2013/09/06/troubleshooting-scenario-3-role-stuck-in-busy.aspx
For me, the issue was with my SQL Azure DB firewall rules. My Azure SQL Database servers are not set to "Allow Access to Azure services", so I have to explicitly list IPs that are allowed.
I discovered this after wrapping my code in a try/catch that swallowed all exceptions, refactoring my OnStart() and RunAsync() methods, and setting all my references to Copy Local = True. None of that worked, then I saw that I had this line in my RunAsync() method:
log4net.Config.XmlConfigurator.Configure();
I am using the AdoNetAdapter for log4net and connecting to an Azure SQL DB for logging, so that led me to check the firewall rules.
For me, I had some differing version of nuget packages in my various projects. Once I consolidated everything to the same version(s), it worked fine.
With the release of Windows Azure SDK version 2.2 for Visual Studio 2012 and 2013, Now you can Remote Debug Cloud Resources within Visual Studio.
Once your cloud service is published and running live in the cloud, you can simply set a breakpoint in your local source code. This may help you in digging out what's going wrong!

How to replace one dll in deployed azure worker role to modified version?

I need to replace one dll-file in deployed azure worker role to one that I modified, because role contains a bug and I don't have a release tag. I'm trying to do that via rdp, but when I'm trying to copy new dll into approot folder VM tells me that old dll file is open in another program and can't be replayced.
This isn't a good idea. You should do this by repackaging the deployment and performing an update. By attempting to do this via RDP you may replace the file, but if the role goes down or gets moved then when Windows Azure bring the role back up the change will be gone since it will redeploy the last package it knew about, so you'd be back to the dll with the bug in it.
As for why it is telling you it is open is because the worker role is actively using it most likely. You'd have to stop the worker role process to be able to replace it. The best option is still to perform an update of the whole package.
You can see this documentation for more information about how the updates occur: http://msdn.microsoft.com/en-us/library/windowsazure/hh472157.aspx
I agree with MikeWo's suggestion about repackaging and updating the deployment.
However, if you want to drop the single DLL and check to see if fix works. you can kill WaWorkerHost.exe - the blue highlighted process in the picture. then you can replace the DLL.

how long does it take to create an Azure Cloud Service? How to view log information?

I'm brand new to Azure. I'm trying to get a Cloud Service running with 3 web roles.
Last night I created the .cspkg and .cscfg files, exported the certificates and uploaded everything to Azure manager. The manager said my Cloud Service was successful, however for the last 10 hours when I click on "Cloud Services" in the manager it shows my service, but it says "Creating" with wait gif under "SERVICE STATUS".
Is it really still creating? Or did it fail? Is it possible to view more detailed information about the creation process and/or any log files?
Thanks,
Something "bad" has happened. Service Spin up time should be at most a few minutes.
I've seen it take up to 10-15 minutes depending on the hosting center and the number of scaled instances that need to come up but 10 hrs something has definitely gone wrong.
I would delete the service and start again. If you experience the same problem, have a look in your service start up code and make sure that there's no exceptions/infinite loops, other problems in there that might be causing problems.
It may miss any assembly references. If you included any assembly references(packages that are not part of .net) then please ensure that its copy to local attribute is set to true.

How do I debug a Worker Role using Remote Desktop with Windows Azure?

I now have my Windows Azure environment set up so that I can access my Worker Role with Remote Desktop. However, I'm not sure how to proceed at the moment. After much digging I found a web site that was offline but in Google's cache there was mention of attaching to the Worker Role running in the Azure Cloud from the Visual Studio debugger. But I only have Visual Developer (not studio) 2010 and I have searched all over and as far as I can see there is no such option to attach to a remote server. I am able to publish my project to the Azure Cloud without error and I have a "healthy" instance of my Worker Role showing as active and running.
I did connect with RDP through the Azure Management portal. The login worked fine and up came the remote desktop window. I searched through much of what I could find and was unable to find my Worker Role. I must have the wrong impression of RDP, because I had hoped to see the Worker Role's main display form when I logged in, just like I do when I debug it locally in the Cloud Emulator. But instead all I saw was a blank desktop with some base level server inspection and management routines. I even checked the Event Viewer for Application related messages and saw none.
So now I'm stuck wondering if my Worker Role is actually running or not, despite the seemingly positive status messages from the Management Portal, and I still want to attach to my Worker Role for debugging through Visual Developer, if it's possible, but I am unable to figure out how.
Anyone with experience in this area that can give me some solid tips on what to do next, please respond.
UPDATE: I believe my worker role may be running because I opened a command window and did a Netstat and saw it listening on the correct port. However, that may just be my Worker Role shell class that starts the custom EXE I have it launch as a spawned proces. I still haven't confirmed if my custom EXE is running yet.
UPDATE-2: Just ran TaskList from a command window and the custom EXE is listed.
UPDATE-3: Everything is working as I just ran a remote test of the service so that's not a problem. Still want to know how to attach to the Worker Role from Visual Developer 2010 for remote debugging, and if it's possible to see the custom EXE's display form like I do when doing local debugging in the Cloud Emulator.
-- roschler
There is a set of articles here which goes in length on how to set up for remote debugging in Azure:
http://blogs.u2u.be/peter/post/2011/06/21/Remote-debugging-an-Azure-Worker-role-using-Azure-Connect-Remote-desktop-and-the-remote-debugger.aspx
http://blogs.u2u.be/peter/post/2011/06/24/Remote-debugging-an-Azure-worker-role-using-Azure-Connect-remote-desktop-and-remote-debugger-part-2.aspx
http://blogs.u2u.be/peter/post/2011/06/26/Remote-debugging-a-Windows-Azure-Worker-Role-using-Azure-Connect-Remote-desktop-and-the-remote-debugger-part-3.aspx
The key takeaway is that you don't need to actually install Visual Studio on Azure, you only need to copy the Remote Debugger bits and then use Azure Connect to add your developer machine to the Virtual Network.
You can setup Remote Debugging with Visual Studio 2012
http://code.msdn.microsoft.com/Remote-Debugging-Windows-dedaaec9
When you say:
But instead all I saw was a blank desktop with some base level server inspection and management routines.
this is exactly what you get with an Azure VM. It's a basic OS install, plus the bare minimum of Azure stuff it needs to run and the code you've uploaded. There's no fancy monitoring or health checks available on the machine by default, you're expected to have provided those yourself to have them available without having to RDP into the machine to check on it.
RDP is very good for tracking down certain problems, like checking that a startup task will run, checking which directories items are installed in and just generally being nosey. If you need extra tools to track down a problem, you can just install them while you're connected to the server. For example I have RDPed into a server and installed the Microsoft Debugging Tools, to track down a memory issue.
I suppose you could remote into your VM, install Visual Studio there, and debug the process...
I also suppose it might be possible to enable remote debugging (not sure what's involved there, but such a thing exists, and it works over TCP) and debug from a local instance of Visual Studio.
To my knowledge, neither is commonly done.
Based on other answers, you would be better off writing a log file to a local storage. You can read the file from RDP if you reallyhace to. Keep in mind, debugging on Azure isn't really simple, and rightly so.
What I was thinking though was, maybe you could run the process using the user's credentials. I can't verify at the moment, but you have a better shot of seeing the ui when you rdp.

Resources