Azure GetMachineGoalState dying from HTTP 410 error - azure

My Azure web role is constantly recycling.
The WaAppAgent.log file on my Azure web role contains a whole stream of these errors:
[00000008] [05/15/2012 00:10:20.90] GetMachineGoalState() failed with exception: Microsoft.ServiceModel.Web.WebProtocolException: Server Error: Gone (Gone) ---> System.Net.WebException: The remote server returned an error: (410) Gone.
at System.Net.HttpWebRequest.GetResponse()
at System.ServiceModel.Channels.HttpChannelFactory.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)
--- End of inner exception stack trace ---
at Microsoft.ServiceModel.Web.WebHttpChannelProxy`1.Invoke(IMessage msg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at Microsoft.WindowsAzure.RoleContainer.Protocol.IControlSystem.GetMachineGoalState()
at Microsoft.WindowsAzure.GuestAgent.ContainerStateMachine.ControlSystem.GetGoalState().
[00000008] [05/15/2012 00:10:20.90] Caught exception in pre-initialization heartbeat thread, will continue heartbeats: System.NullReferenceException: Object reference not set to an instance of an object.
at Microsoft.WindowsAzure.GuestAgent.ContainerStateMachine.ContainerStateManager.InitialHeartbeatThread(Object waitEvent)
[00000005] [05/15/2012 00:10:23.24] Agent runtime initialized.
The Azure service status dashboard says everything is green on their side, but it always says that unless the entire platform is down in a smoldering heap so I don't really trust it at all.
Is this my problem, or theirs?

I take your statement that you are using Full IIS based Windows Azure Web Role. I have seen such error with VM Role but with Web Role it seems very strange. I wouldn't weight much on your logs either at this time because it may mislead to wrong direction.
If your Web role status shows green (Ready) state but your site is not available then issue could be related to application itself because green or ready state means the role host process (in this case waiishost.exe) is healthy. It would be rare occurrence when role status is green and your role host process is unhealthy. Based on what you have provided either there is an issue within the VM startup sequence means your role not even started however the portal must not show ready state.
Please log into your Azure VM using RDP and..
Check first that WaIISHost.exe process is running or not.. keep an eye in this process for 2 minutes to check if this process is crashing and restarting again
Please check application event log because it should have some pattern of exception to find the root cause
Please check recent Azurebootstrapper and iisconfigurator logs located in C:\logs folder for any specific issue during VM boot and IIS start respectively.
At last please backup the logs (drive C:\logs and C:\Resources), please try rebooting your instance. Still have problem contact Windows Azure support team:
https://www.windowsazure.com/en-us/support/contact/

Related

The specified CGI application encountered an error and the server terminated the process when running on Azure Functions ASP

I meet a problems when deploy my code to Azure Funtions using App Service Plan. My function is long running (about 4 minutes, for business logic) and when I call it by Post man, it response 502.
The specified CGI application encountered an error and the server
terminated the process.
In this time my function is still running, I monitor on the portal of Function Apps and it return a success response after 4 minutes, but I meet error and can not receive the response when call from client.
Response Message & Status Code
Response Header
I read below link and see that may be the limitation is from Azure App Service Plan, but how to config ASP to solve this error ?
Some information:
- Azure Functions V2.
- App Service Plan S1 Standard, 2 instances.
The specified CGI application encountered an error and the server terminated the process
I was having a similar issue, after debugging it turned out i had an unhandled exception inside a Parallel.Foreach Loop. The parallel loop did not have any try/catch within it, and as a result having an exception thrown bubbled up to the app domain killing and restarting the web app, with this same error showing:
"The specified CGI application encountered an error and the server terminated the process."
You should check all your asynchronous threads for error capturing

Azure Service Fabric Activation Error 7148

I have a service fabric cluster which hosts numerous applications. One of the applications has a service type where the service is created, runs for a bit, and then is deleted. Everything works great, but the cluster virtually always has its state set to error because there will be a few of these in the "Unhealthy evaluations" section.
Error event: SourceId='System.Hosting', Property='CodePackageActivation:Code:EntryPoint'.
There was an error during CodePackage activation.The service host terminated with exit code:7148
I've wrapped both the program's main and RunAsync in exception handlers, but never see anything in analytics. Is there any way to look up what exit code 7148 means? Thanks.
7148 is a general error code that indicates that something failed in SF in the process of setting up or activating your service's host process. So that's the reason that you're not seeing any errors or exceptions - your code is never getting a chance to run.
Examples of things I've seen that led to 7148:
The exe was not actually a windows exe due to corruption
The service's manifest had a reference to a cert or some other pre-req like an endpoint that was incorrectly configured (like a port that was already in use or the wrong thumbprint for a cert)
Something blew up inside Windows that cause the process creation to fail, like a failure to correctly configure host networking for a container
Most of the times when I see this I have to look at the windows error logs to see what's really happening. The SF folks are also trying to capture more common causes of failures and reporting them as better health errors rather than relying on 7148.

Continuous Webjob Restarts automatically

I have a continuous running web job which starts long running process on a background thread. I have enabled Application logging to write all the logs into a storage table.
Sometimes the web job automatically restarts for no apparent reason and nothing is logged in the storage table. Below is the log information from the storage table
I assume whenever the web job stops it should write into the logs (first line in the below screen capture).
I looked at the memory and CPU consumption of the web app and it was always below 50% during the entire month.
I am using "Basic" pricing tier for the web app and set the "Always On" to true.
How to find out what is the reason behind the web job shutting down? Is there any other place where I can look for more detailed log?
EDIT Below is the logs from the scm site which still does not say why it stopped :(
EDIT 2 In the log I found some more information.
Could not send heartbeat. Access to the path 'C:\DWASFiles\Sites\MyWebApp\VirtualDirectory0\data\DaaS\Heartbeats\RD000D3A702E55' is denied.
at DaaSRunner.Program.Main(String[] args)
at DaaSRunner.Program.StartSessionRunner()
at DaaS.HeartBeats.HeartBeatController.<GetHeartBeats>d__8.MoveNext()
at DaaS.HeartBeats.HeartBeatController.GetNumberOfLiveInstances()
at DaaSRunner.Program.SendHeartBeat()
at System.Linq.Enumerable.Count[TSource](IEnumerable`1 source, Func`2 predicate)
at System.IO.File.Delete(String path)
at System.IO.File.InternalDelete(String path, Boolean checkHost)
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at DaaS.HeartBeats.HeartBeat.OpenHeartBeat(String heartBeatPath)
Any help?
Look at the WebJob's log file for system logs.
You can get to it by going to the WebJobs dashboard - https://{sitename}.scm.azurewebsites.net/azurejobs

Azure WebRole won't start after 1.7 upgrade

I've recently tried to upgrade my WebRole from Azure SDK v1.6 to v1.7. This appears to have worked OK. I can build and run the role in my devfabric just fine. When I try to deploy the upgraded project to the real cloud, the instances never start. They just sit in the "busy" state. Interestingly, they don't do the typical "recycle loop", they just sit at "busy" forever.
When I log into the instances with RDP, I see the following error in the event logs:
The application '/' belonging to site '1' has an invalid AppPoolId 'DefaultAppPool' set. Therefore, the application will be ignored.
Followed by:
Site 1 was disabled because the root application defined for the site is invalid. See the previous event log message for information about the root application is invalid.
Looking in IIS manager confirms that there is no AppPool called "DefaultAppPool". There also are none of the typical AppPools with GUIDs for names that Azure creates. Unsurprisingly, none of my sites exist either.
So how do I resolve this?
I had the same issue after upgrading to v1.7, but upon looking at the Windows Azure logs in the Azure VM I noticed the following exception:
An unhandled exception occurred. Type: System.ArgumentException Process ID: 2340
Process Name: DiagnosticsAgent
Thread ID: 1
AppDomain Unhandled Exception for role Backend_IN_0
Exception: Endpoint http://xxxx.blob.core.windows.net/ is not a secure connection.
So I changed the Diagnostics connection string to use https instead of http and voilá, that solved my problem.
Hope that works for you, I've been pulling my hair off for two days.

Azure Worker Role generating writing unexpected error to Trace log storage

We have a worker role running in the cloud which polls an Azure CloudQueue periodically retrieving messages that a web role has put on there for us. Currently the worker role and web role are housed in the same Cloud Service application and currently we are only running one instance.
As we are testing we have our logging switched on and so the contents of the messages and other useful information appear in our cloud storage which we view using Cerebrata Azure Diagnostics Manager. (Great product btw)
DiagnosticMonitorConfiguration diagConfig = DiagnosticMonitor.GetDefaultInitialConfiguration();
diagConfig.Logs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;
It all appears to work remarkably well actually, however occasionally we see a Verbose message in the trace log which simply has "Fail"as the message. The code it appears to be generated from is wrapped in a try catch so it is odd that we aren't seeing the message through those means.
It would appear that something is happening that is out of our code's control, perhaps the worker role is being restarted, or the cloud op system is detecting a major error that only it can deal with by restarting our worker role. It recovers and carries on so it is somewhat of a mystery to us what might be happening.
What we haven't ascertained yet is whether we are losing a message.
Any help would be gratefully appreciated.
Cheers
Kindo Malay
Without the stack trace it's hard to say too much, but with the logging set to verbose it's quite likely that you're seeing some internal logging from one of the dlls you're using.
For example if you run a Azure Table query that causes certain kinds of errors, the error will be logged out 3 times because the storage client library is catching the error, tracing it out and then retrying.
If the error is not being caught by your try catch block, then it's likely nothing you need to worry about.
If deliverability of queue messages is important to you, you should ensure that you make use of the visibility timeout overload of CloudQueue.GetMessage and only delete the message when you've finished processing it. You may end up processing some messages twice, but at least you will process all of them.
If your role instance is getting restarted after running for a while, it's often because your process exited due to an unhandled exception.

Resources