WaWorkerHost.exe crashes role: CallbackException - c#-4.0

When I run my WorkerRole C# application on Azure, after a while waworkerhost.exe crashes due the following exception:
Application: WaWorkerHost.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.Runtime.CallbackException
Stack:
at System.Runtime.Fx+IOCompletionThunk.UnhandledExceptionFrame(UInt32, UInt32, System.Threading.NativeOverlapped*)
at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
I have an application that generates load to a webserver. I don't care about the actual response, but i want to control the number of requests made per second.
Therefore i have a Timer that fires every second and generates a number of requests. I have tried the following options:
Parallel.For with WebRequests
For loop with ASync WebRequests
For loop with ThreadPool.QueueUserWorkItem(do
webrequest)
When the number of requests increase, the exception occurs (8+ req/sec). The same exception for all three options. When I run the role in local DevelopmentFabric all three options work just fine. If someone could give me some pointers on what might be going wrong I appreciate it. If you have other ideas to generate this type of load from Azure and C#, please share your thoughts.

The author answered the question in the comment to the original post, but for better visibility, I'm reporting it to here:
Turn out to be an IntelliTrace issue, see
http://social.msdn.microsoft.com/Forums/en-ZA/windowsazuretroubleshooting/thread/543da280-2e5c-4e1a-b416-9999c7a9b841:
...
After redeploying my solution with Intellitrace disabled, the issues
where resolved, and my WorkerRole stayed healthy.

Related

Application insights: SQL Dependency result code "-2"

This is what I get in ~20 out of ~2M SQL dependency AI registrations.
Apparently this result code does not appear in sys.messages, since all message id's in there are positive.
Also I can't seem to find any stack trace information on this error. It seems to be a timeout or general transient error (handled by Polly from my side), but either way it's registered under dependencies and not exceptions.
Does anybody know what this error is and where I can find more information regarding all possible SQL dependency errors I might get?
I have submitted this issue to app insights team here, and just get feedback:
-2 means Timeout expired. The timeout period elapsed prior to completion of the
operation or the server is not responding. (Microsoft SQL Server, Error: -2).
The Sql Exception number is here.
And if you have any concerns, please comment at that thread. Hope it helps.

Automatic reboot whenever there's an uncaught exception in a continous WebJob

I'm currently creating a continous webjob that will do polling to an API, and then forward messages to an Azure Service Bus. I've managed to get this to work just fine, but I have one problem; what if my app crashes for whatever reason? What if there's an uncaught exception, or something goes wrong, and the app stops running. How do i get it to run again?
I created a test app, which will send a message every to the Service Bus, then on the 11th message it will crash due to an intentionally placed NullReferenceException. I did this in order to investigate behaviour whenever/if the app crashes.
What happens is that the app runs just fine for the first 10 seconds (as expected). Messages are being sent, and everything looks good. Then after the 10th second, when the exception occurs, nothing happens. No log in Azure saying there was an exception, no reboot - nothing. It just stands there as "running", but messages are no longer being sent.
How do I deal with this? It's essential that the application is able to reboot if it fails. Are there any standard ways to do this? Best practices?
Any help would be appreciated :)
It is always good to handle most of the failure scenarios in the system by ourselves rather than to let the hosting environment to react for the failures.
My suggestion would be to have a check in the code for exceptions like any try catch block in your executable script to catch different kind of failure scenarios and instead of throwing the exceptions, log it your self or take any retry operation if required.
Example, when you got a junk data to process and it failed. Then you can try to do the operation again for eg. 3 times and then finally push a log to deadletter account to manually take care of such junk inputs. And don't let the flow be stopped by throwing the exception but instead handle it your self by logging a message which needs manual intervention.
In any GUI or Web applications, if there is an exception then the flow is re initiated by user click and system will respond. But here as it a background processor, it is ideal to avoid all such control flow blockers.
Hope this would help.

Ghost (NodeJS blog) on Azure: Periodic 500 error troubleshooting

Background / Issue
Having a strange issue running a Ghost blog on Azure. The site seems to run fine for a while, but every once in a while, I'll receive a 500 error with no further information. The next request always appears to succeed (in tests so far).
The error seems to happen after a period of inactivity. Since I'm currently just getting set up, I'm utilizing an Azure "Free" instance, so I'm wondering if some sort of resource conservation is causing it behind the scenes (which will be allevaited when I upgrade).
Any idea what could be causing this issue? I'm sort of at a loss for where to start since the logs don't necessarily help me in this case. I'm new to NodeJS (and nodeJS on Azure) and since this is my first foray, any tips/tricks on where to look would be helpful as well.
Some specific questions:
When receiving an error like this, is there anywhere I can go to see any output, or is it pretty much guaranteed that Node actually didn't output something?
On Azure free instances, does some sort of resource conservation take place which might cause the app to be shut down (and thus for me to see these errors only after a period of inactivity)?
The Full Error
The full text of the error is below (I've turned debugging on for this reason):
iisnode encountered an error when processing the request.
HRESULT: 0x2
HTTP status: 500
HTTP reason: Internal Server Error
You are receiving this HTTP 200 response because system.webServer/iisnode/#devErrorsEnabled configuration setting is 'true'.
In addition to the log of stdout and stderr of the node.exe process, consider using debugging and ETW traces to further diagnose the problem.
The node.exe process has not written any information to stderr or iisnode was unable to capture this information. Frequent reason is that the iisnode module is unable to create a log file to capture stdout and stderr output from node.exe. Please check that the identity of the IIS application pool running the node.js application has read and write access permissions to the directory on the server where the node.js application is located. Alternatively you can disable logging by setting system.webServer/iisnode/#loggingEnabled element of web.config to 'false'.
I think it might be something in the Azure web config rather than Ghost itself. So look for logs based on that because Ghost is not throwing that error. I found this question that might help you out:
How to debug Azure 500 internal server error
Good luck!

Azure Worker Role generating writing unexpected error to Trace log storage

We have a worker role running in the cloud which polls an Azure CloudQueue periodically retrieving messages that a web role has put on there for us. Currently the worker role and web role are housed in the same Cloud Service application and currently we are only running one instance.
As we are testing we have our logging switched on and so the contents of the messages and other useful information appear in our cloud storage which we view using Cerebrata Azure Diagnostics Manager. (Great product btw)
DiagnosticMonitorConfiguration diagConfig = DiagnosticMonitor.GetDefaultInitialConfiguration();
diagConfig.Logs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;
It all appears to work remarkably well actually, however occasionally we see a Verbose message in the trace log which simply has "Fail"as the message. The code it appears to be generated from is wrapped in a try catch so it is odd that we aren't seeing the message through those means.
It would appear that something is happening that is out of our code's control, perhaps the worker role is being restarted, or the cloud op system is detecting a major error that only it can deal with by restarting our worker role. It recovers and carries on so it is somewhat of a mystery to us what might be happening.
What we haven't ascertained yet is whether we are losing a message.
Any help would be gratefully appreciated.
Cheers
Kindo Malay
Without the stack trace it's hard to say too much, but with the logging set to verbose it's quite likely that you're seeing some internal logging from one of the dlls you're using.
For example if you run a Azure Table query that causes certain kinds of errors, the error will be logged out 3 times because the storage client library is catching the error, tracing it out and then retrying.
If the error is not being caught by your try catch block, then it's likely nothing you need to worry about.
If deliverability of queue messages is important to you, you should ensure that you make use of the visibility timeout overload of CloudQueue.GetMessage and only delete the message when you've finished processing it. You may end up processing some messages twice, but at least you will process all of them.
If your role instance is getting restarted after running for a while, it's often because your process exited due to an unhandled exception.

IIS 6.0 Application Pool crash

Have a lot of troubles on production server. Some routing cause crashing of Application Pool with event id 1011:
Event Type: Warning
Event Source: W3SVC
Event Category: None
Event ID: 1011
Date: 1/21/2009
Time: 9:08:17 AM
User: N/A
Computer: xxxxxxxxxxxxx
Description:
A process serving application pool 'DefaultAppPool' suffered a fatal communication error with the World Wide Web Publishing Service. The process id was '3788'. The data field contains the error number.
8007006d
I have a few very hard hours for me before I found a problem.
Thanks to Tess Ferrandez and her blog post I found it.
Always double check Your multithreaded code in asp.net application. When Unhandled exceptions occurs application pool crashes and it's damn hard to find WHY.
Tess's blog was a little advanced for me. I had to search around for quite a bit before I found the right articles that helped me debug my dump files. This article will help others who want to debug their crashing asp.net application pools but don't know how to start.

Resources