Azure worker role deployment fails if I don't delete the previous deployment - azure

I've been having a generally wretched time deploying my worker roles to Azure. I'll publish my worker role once from Visual Studio and everything will work fine. I'll publish the worker role again later and the deployment fails. The instance goes into a 'recycling loop'. I spend hours trying to figure out what I broke. I tried Intellitrace but it always fails with a 'cannot download intellitrace logs' error message. Then eventually I'll delete the deployment from inside Azure Management Portal and try again and the same code that's been failing to deploy for hours will magically work.
This doesn't happen all the time, and some projects seem to 'fix' themselves and stop demonstrating this behavior all together. But what seems to be happening is that a publish from Visual Studio will fail unless I go manually delete the existing deployment.
I know this maybe a little vague but I've really got nothing to go on here. Intellitrace never works and I can't Remote Desktop into the role to poke around because it recycles so fast (which may also be why Intellitrace isn't working).
Does anyone have any idea what might be going on here?

I did more research and I think I may know what's going on. Apparently Visual Studio tries to upgrade your worker roles in place when you deploy. If that fails, for some reason such as you changing service configuration between deployments, it just complains that there's something wrong with your role and that your instance is recycling.
In deployment options there is an option called 'If deployment can't be updated, do a full deployment' which will delete the existing deployment and deploy from scratch if the existing deployment can't be updated. I'm not sure why this isn't checked by default instead of 'fail mysteriously'.

Related

Azure Function: Old code still running after a deployment

Right now I again faced the issue that old code is used on an Azure Function App even after the zip deployment through KUDU returns success.
Of course, that is after some 30 mins that I expect the new code to get loaded, not immediately.
The issue is marked as closed.
What is considered to be the best practice in this case:
Programmatically force the Function App to restart, say, through Azure CLI or Powershell Az modules?
Or there is another way to mitigate the issue?
While restarting should fix it, my suggestion would be to enable "Run from package": https://learn.microsoft.com/en-us/azure/azure-functions/run-functions-from-deployment-package. That removes the chance of having old files running as the deployment is atomic.
You'd set an app setting of WEBSITE_RUN_FROM_PACKAGE to 1 and continue deploying the same way you are today. The site will be run directly from that package (wwwroot will appear as read-only in kudu) so there's no unzipping and copying, which may be causing the issue you're having.
Note: it looks like we're still tracking the issue here: https://github.com/Azure/azure-functions-host/issues/2636.
In my case the issue lay in the CI-CD pipelines, where an out-of-date artifact was being deployed - thus the successful deploy, but the old code.

Azure Functions V2 have to stop functions every time before publishing because dll is busy

So this is new on V2, when I publish with Visual Studio (probably with vsts publishing as well). It says the dll is busy, it didn't used to do that in V1.
I suppose it's fine to stop the functions (Or probably I can do slot deployment as well, although all mine are triggered and scheduled, so I don't really need slots).
So on the "stop" it will still do that 30 seconds of graceful stop that the functions do before shutting down / switching (Heard this on a podcast when I asked).
If I redeploy after the stop, I suppose it's ok this way. Things will wait on my triggers from azure queue and schedule.
Only thing is it's sort of a pain to have to press start and stop rather than just publish. I'm not sure if it's supposed to be doing this or not. Doesn't seem publish will ever work unless it's stopped, why not have it auto-stop the function?
Looks like you meet ERROR_FILE_IN_USE
.
You can configure the appOffline rule in the publishing profile (In Solution explorer> Properties>PublishProfiles>*.pubxml). Set the EnableMSDeployAppOffline to true as below.
<PropertyGroup>
...
<EnableMSDeployAppOffline>true</EnableMSDeployAppOffline>
...
</PropertyGroup>
This setting take the app offline so the file lock is released and your app will start automatically after deploy.
With the caveat that ZIP deployment is now preferred, the solution to this is to add an app setting MSDEPLOY_RENAME_LOCKED_FILES with value 1.

Azure-Deployment to stage ignores service configuration

I created a cloud service and tested it successfully locally. I added service configurations for stage and production. Here is a snippet of my staging-configuration:
and here my configuration-settings:
Then when I publish I set up the deployment as follows:
All this worked like 2 weeks ago. But now he deploys in VS and when I look into Azure Service Configure area it looks like this:
I played a little bit with the "Update development ..."-checkbox on the second screen but the result is the same.
So it ignores all the settings I made and just won't tranistion my configuration to the ine I named "CloudStage". My current Web PI tells me that I use Windows Azure SDK for .NET (VS 2013) 2.3. I don't get the point.
Edit
Some more things I observed:
No WADLogsTable and WADWindowsEventLogsTable is generated automatically in the staging storage.
I deactivated Remote Desktop because it was one of the changes I made to monitor the event log (which wasn't useful here)
I manually changed the connection strings in Azure Portal but it seems as if the worker is totally unaware of the storage (rebooted it with no success).
Edit
I recognized another thing. Here you can see a running deployment of my service:
See the warning-mark on the left? If I go to my Error list this is shown:
This warning is senseless since it tells me that I did everything the right way. My *.Local.csfg-files are pointing to the local storage. So?!?
This seems weird. Please check the in your ServiceConfiguration.CloudStage.cscfg to verify the expected values.
Have you tried updating any other property like Enabling Remote desktop? Does that get updated on your deployment? You should select the "Deployment Update" check box in the publish dialog. Now, when deploying to an existing Cloud Service, it should ask you if you want to replace it.
If you get the Object reference error every time you right click on project, there might be some issue with the Azure SDK set up.
I'm a little bit further now. What I did was:
Deleted all Services in Azure.
Deleted all Storage Accounts in Azure
Removed my Service-Project completely from solution (not the library containing the worker-logic).
Re-added storage-accounts in Azure.
Re-added services in Azure.
Re-added a project in the solution and added the worker-logic inside it.
Builded up all the publishing-stuff again.
Published it.
The first publish ended like the one described in my question. After I checked the "Update development..."-option in properties of my worker it finally took my transitions into the stage!
Now I recognized, that WADLogsTable was still empty. I hit the instance right in server-explorer and choosse "Update diagnostic settings...". There was an option "Transfer period" suddenly set to "None". This explained to me, why my table was empty and after I set it back to "1" my table is filling again!
Another funny thing beside: When I right-click my Cloud-project in the solution I get "Object reference not set to an instance...". When I just click it left and choose Build->Publish it works.
I just hope that I can help somebody with this. Lets see if it's stable now.
Edit: Yesterday it worked - today is still the same issue :-(.
When you get "Object reference not set to an instance.." for a CloudService project you usually have some kind of mismatch. It could be that a setting in the ServiceConfiguration is not defined in the ServiceDefinition. It could also be that there is a publish profile defined in the .ccproj file for the CloudService that doesn't exist. This might also be what is causing your problems with the different configurations.
So it turns out that the problem is completely on client-side. My Visual Studio (now with SDK 2.4) is doing something wrong. I set up a fresh installation with all the stuff needed :-( and there it works perfect. I'll try to determine if one of my extensions is causing the strange "Object reference not set..."-bug.
Repair-Installation of VS does not solve the problem btw.

Your role instances have recycled a number of times during an update or upgrade operation

I am trying to deploy a Cloud Service with 1 Web Role to Azure.
When I do so, I get this message:
Your role instances have recycled a number of times during an update or upgrade operation. This indicates that the new version of your service or the configuration settings you provided when configuring the service prevent the role instances from running. Verify your code does not throw unhandled exceptions and that your configuration settings are correct and then start another update or upgrade operation.
The project runs just fine locally, and I'm having a hard time figuring out how to start debugging this issue. Are there any common problems that cause this message or steps to figure out what is causing it?
See https://learn.microsoft.com/en-us/archive/blogs/kwill/windows-azure-paas-compute-diagnostics-data. This will walk through all of the diagnostic data available as well as how to troubleshoot the most common issues.
We also had this annoying problem and in our case:
We use local storage, but it wasn't defined in service definition (or Worker Role's properties)
Our worker role project has reference to a service project which has reference to data layer project. But, the worker role project doesn't have reference to the data layer project. As soon as we added reference to data layer project in worker role project, it deploys successfully.
Problem #1 can be easily noticed if you first run the project in your local machine. Exception will be thrown.
Problem #2, however, is more difficult, mainly because it runs just fine in local machine. After 5 days of trouble shooting, we finally found the problem. So, check all references and try to add sub-reference projects, those that are referenced by other references.
We had similar problem, and it was due to some DDLs failed to load. (due to different version from the one MS have deployed to the VM)
Try to set CopyLocal to "true" for all the References in the project, and re-deploy.
I would either remote desktop to the cloud instance and review the Windows Event Logs for exceptions or redeploy with IntelliTrace Enabled. If you choose the later, you can download the IntelliTrace logs from Visual Studio and debug
http://msdn.microsoft.com/en-us/library/windowsazure/ff683671.aspx
One way to find out the actual error is to click on the " 1 instance" at the top of Dashboard after trying to deploy your web role. It will tell you the status of the role instance. The status should include more information about the type of error which blocks your deployment.
It depends on what your case is. For me, the status claimed that I had an unhandled Security exception. After some investigation, it turned out that under my role's OnStart(), I tried to create a event source. However, Azure service doesn't have the permission to create an event source.
For more possible issues, check http://blogs.msdn.com/b/kwill/archive/2013/09/06/troubleshooting-scenario-3-role-stuck-in-busy.aspx
For me, the issue was with my SQL Azure DB firewall rules. My Azure SQL Database servers are not set to "Allow Access to Azure services", so I have to explicitly list IPs that are allowed.
I discovered this after wrapping my code in a try/catch that swallowed all exceptions, refactoring my OnStart() and RunAsync() methods, and setting all my references to Copy Local = True. None of that worked, then I saw that I had this line in my RunAsync() method:
log4net.Config.XmlConfigurator.Configure();
I am using the AdoNetAdapter for log4net and connecting to an Azure SQL DB for logging, so that led me to check the firewall rules.
For me, I had some differing version of nuget packages in my various projects. Once I consolidated everything to the same version(s), it worked fine.
With the release of Windows Azure SDK version 2.2 for Visual Studio 2012 and 2013, Now you can Remote Debug Cloud Resources within Visual Studio.
Once your cloud service is published and running live in the cloud, you can simply set a breakpoint in your local source code. This may help you in digging out what's going wrong!

Azure: Worker role looping through "recycling"

I'm currently working on an Azure project that works 100% locally with emulator resources. I'm now trying to deploy a worker role, but I'm running into an issue that I'm not sure how to troubleshoot.
Upon deploying the worker role in my Azure portal, the two instances continually loop through "recycling".
I can try to RDP into the role, but I only have about a minute to look around before the connection closes, I'm assuming due to the recycling.
After some searching it doesn't seem like this is a super common problem. Is there something trivial I'm overlooking that could be causing this issue? How would you go about troubleshooting this? Thank you for your time :)
In case of missing Reference you can troubleshoot this issue by:
Unzip your CSPKG file and then again unzip .CSSX file (just rename CSSX to zip) and match that everything references and static content is all there.. This way you can match what is on VM. Also in 2 minute windows when you RDP, try to look for Application event log for exception and get it because that would be the key to find the root cause.
IF you could see the exception in event log and look for the exception, you sure can find where it was generated. You can also use Intellitrace which might require you to redeploy the app.
Also there are ways were copying WinDBG and locking to the specific process you can debug it. I am not sure how much you would want to try but just copy the WinDBG to VM and use it would be enough (not sure how much experience you have with WinDBG though and how much time you would want to spent.)
Also been pestered by this role recycle issue numerous times. Here is the sequence of steps to debug persistent role recycles:
Debugging Azure Role Recycles
Enable Remote Access to your role - RDP login
Check eventvwr.msc (Windows Logs -> Application, App & Service Logs->Windows Azure)
Review the Azure text file logs across both C:\logs and c:\resources
Review custom logs in the Volume E: or F: for any custom role startup logging
Run AzureTools and attach to startup processes (download WinDBG, use Utils->Attach Debugger, select process - WaWorkerHost/WaIISHost, etc), use G to continue and watch debugger output for assemblies failing to load.
Installing Azure Debugging Tools via Powershell
PS> md c:\tools; Import-Module bitstransfer; Start-BitsTransfer http://dsazure.blob.core.windows.net/azuretools/AzureTools.exe c:\tools\AzureTools.exe; c:\tools\AzureTools.exe
If all items above fail - try using other tools in the AzureTools treasure trove - such as fusion logging, etc, this approach above will work!
WinDBG Sample Output - Failing to Locate Assembly (WaIISHost)
The most likely cause is that you have a missing assembly. One tactic to catch this is to wrap any startup processing in a master try/catch that manual logs the error to Azure storage.
If you added any referrences, check to make sure they're set to copylocal=true and that any external assets that were included in your service package were also set to be included.
From Avkash above:
Yes. this mean some issue in your Worker Role code is causing your Worker Role Host Process to crash.. If you look your fault stack you must see the function or the link from your code which generate this fault. IF you need help open a free Azure Support incident to Windows Azure Support team and they will help you.
Just a suggestion: Also Check the installable(if any)and any other references you use are 64bit.Azure VMs have 64bit OS. Once i was stuck up with this kind of problem due to 32/64 bit issues.
Are your worker roles exiting their work loop? A local recycle is very fast and you might not notice it, but spin-up time in the cloud can be long.
If the issue is caused by a startup batch file, I have stopped the loop by editing the batch file on the instance to include "exit /b 0" at the beginning. This will tell Azure that the startup was successful and you then have all the time you need to diagnose issues without the VM getting killed.

Resources