Automatic reboot whenever there's an uncaught exception in a continous WebJob - azure

I'm currently creating a continous webjob that will do polling to an API, and then forward messages to an Azure Service Bus. I've managed to get this to work just fine, but I have one problem; what if my app crashes for whatever reason? What if there's an uncaught exception, or something goes wrong, and the app stops running. How do i get it to run again?
I created a test app, which will send a message every to the Service Bus, then on the 11th message it will crash due to an intentionally placed NullReferenceException. I did this in order to investigate behaviour whenever/if the app crashes.
What happens is that the app runs just fine for the first 10 seconds (as expected). Messages are being sent, and everything looks good. Then after the 10th second, when the exception occurs, nothing happens. No log in Azure saying there was an exception, no reboot - nothing. It just stands there as "running", but messages are no longer being sent.
How do I deal with this? It's essential that the application is able to reboot if it fails. Are there any standard ways to do this? Best practices?
Any help would be appreciated :)

It is always good to handle most of the failure scenarios in the system by ourselves rather than to let the hosting environment to react for the failures.
My suggestion would be to have a check in the code for exceptions like any try catch block in your executable script to catch different kind of failure scenarios and instead of throwing the exceptions, log it your self or take any retry operation if required.
Example, when you got a junk data to process and it failed. Then you can try to do the operation again for eg. 3 times and then finally push a log to deadletter account to manually take care of such junk inputs. And don't let the flow be stopped by throwing the exception but instead handle it your self by logging a message which needs manual intervention.
In any GUI or Web applications, if there is an exception then the flow is re initiated by user click and system will respond. But here as it a background processor, it is ideal to avoid all such control flow blockers.
Hope this would help.

Related

Rerun nodejs server script on exception

I have a fairly simple NodeJs server script, that handles incoming data and persists it to a database. The script runs indefinitely. Mostly this works great, but sometimes an error arrises. Is there a way to listen for an exception and if the script was terminated to rerun it again?
(I know the issues for causing the exception must be fixed, and I've done that, but I want to be sure the script always runs as I otherwise loose valuable data; the use case lies in processing IoT sensor data)
It seems like you have an uncaught exception that causes your application to crash.
First of all, you should use try...catch statements to catch exceptions where they can appear within your range of responsibilities. If you don't care about proper error handling and just want to make sure the application keeps running, it is as simple as the following. This way, errors won't bubble up becoming uncaught exceptions to crash your application:
try {
// logic prone to errors
} catch (error) {}
If your application somehow can come into an erroneous state which is unpredictable and hard to resolve during runtime, you can use solutions like process managers, PM2 for instance, to automatically restart your application in case of errors.

Manage Cuncurent request process nodejs

Different users trying to access different application routes with heavy manipulation of data. At the mid time one of the request failed due to internal server error and my whole application has been crashed. Thats why other request has been failed because build has been crashed. Is there any solution to handle this situation?
If your program has thrown an uncaught error and crashed then there's nothing you can really do other than start it back up again. You could use something like pm2 to automatically restart your node process when it crashes, and then at least future requests that come in should work (although you will lose any in memory data from before the last crash).
Another thing that I think would help you would be to move your backend onto a serverless architecture where each invocation of your code is independent of the others.
And of course try to fix the code so that it handles things gracefully and doesn't actually throw errors. :)

Azure Functions: Application freezes - without any error message

I have an Azure Functions application which once in a while "freezes" and stops processing messages and timed events.
When this happens I do not see anything in the logs (AppInsight), neither exceptions nor any kind of unfamiliar traces.
The application has following functions:
One processing messages from a Service Bus topic subscription (belonging to another application)
One processing from an internal storage queue
One timer based function triggered every half hour
Four HTTP endpoints
Our production app runs fine. This is due to an internal dashboard (on big screen in the office), which polls one of the HTTP endpoints every 5 minutes, there by keeping it alive.
Our test, stage and preproduction apps stop after a while, stopping to process messages and timer events.
This question is more or less the same as my previous question, but the without error message that was in focus then. Much fewer error messages now, as our deployment has been fixed.
A more detailed analysis can be found in the GitHub issue.
On a consumption plan, all triggers are registered in the host, so that these can be handled, leading to my functions being called at the right time. This part of the host also handles scalability.
I had two bugs:
Wrong deployment. Do zip based deployment as described in the Docs.
Malformed host.json. Comments in JSON are not right, although it does work in most circumstances in Azure Functions. But not all.
The sites now works as expected, both concerning availability and scalability.
Thanks to the people in the Azure Functions team (Ling Toh, Fabio Cavalcante, David Ebbo) for helping me out with this.

Something making NServiceBus lose messages

I have an NServiceBus configuration that is working great on developers machines and in my Development Environment.
However, when I move it to my Test Environment my messages just start getting tossed.
Here is the system:
An app gets a TCP message from a Mainframe system and sends it to a MSMQ (call it FromMainframe).
An application hosted in IIS has a "Handle" method for that MSMQ and processes the messages from the mainframe.
In my Test Environment, step two only half way happens. The message is popped off the MSMQ, but not processed by my application.
Effectively my data is LOST! NServiceBus removes them from the Queue but I never get to process them. They are not even in the error queue!
These are the things I have tried in an attempt to figure out what is happening:
Check the Config files
Attach a remote debugger to the process to see what the Handle method is doing
The Handle method is never called (but when I attach to the Development Environment my breakpoint in my Handle method is hit and it all works flawlessly).
Redeploy my Dev version to the Test Envioronment and try step 2 again (just in case the versions were not exactly the same.)
Check the Config files again
Check that the Error queue is not filling up
The error queue stays empty (I wish it would fill up, then my data would not be LOST).
Check for any other process that may be pulling stuff from my MSMQs
I Turned off my IIS website and the messages in the FromMainframe queue start to backup.
When I turn it back on, the messages disappear fairly fast (but still not all at once). The speed that they disappear is too fast for them to be processed by my Handle method.
Check Config files yet again.
Run the NServiceBusTools\MsmqUtils\Runner.exe \i
I ran it, rebooted, ran it again and again for good measure!
Check the Configs again (I must have missed SOMETHING right?)
Check the Development Environment Configs are not pointing to the Test Environment
I don't think it is possible to use another computer's MSMQ as your input queue, but it does not hurt to check.
Look for any catch blocks that could be silently killing my message.
One last check of the Config files.
Recreate my Test Environment on another machine (it worked flawlessly)
Run my stuff outside of IIS.
When I host outside of IIS (using NServiceBus.Host.exe) it all works fine. So it has to be an IIS thing right?
Go crazy and hope that stack overflow can offer any kind of insight.
So I know enough about what happened to throw out an "Answer".
When I setup my NServiceBus self hosting I had a call that loaded the message handlers.
NServiceBus.Configure.With().LoadMessageHandlers()
(There are more configurations, but I omitted them for brevity)
When you call this, NServiceBus scans the assmeblies for a class that implements IHandleMessages<T>.
So, somehow, on my Test Environment Machine, the ServiceBus scan of the directory for a class that calls IHandleMessages was failing to find my class (even though the assembly was absolutely there).
Turns out that if NServiceBus does not find something that handles a message it will THROW IT AWAY!!!
This is a total design bug in my opinion. The whole idea of NServiceBus is to not lose your data, but in this case it does just that!
Now, once you know about this pitfall, there are several ways around it.
Expressly state what your handler(s) should be:
NServiceBus.Configure.With().LoadMessageHandlers<First<MyMessageType>>()
Even further protection is to add another handler that will handle "Everything else". IMessage is the base for all message payloads, so if you put a handler on it, it will pickup everything.
If you set IMessage to handle after your messages get handled, then it will handle everything that NServiceBus can't find a handler for. If you throw and exception in that Handle method that will cause NServiceBus to to move the message to the error queue. (What I think should be the default behavior.)

Azure Worker Role generating writing unexpected error to Trace log storage

We have a worker role running in the cloud which polls an Azure CloudQueue periodically retrieving messages that a web role has put on there for us. Currently the worker role and web role are housed in the same Cloud Service application and currently we are only running one instance.
As we are testing we have our logging switched on and so the contents of the messages and other useful information appear in our cloud storage which we view using Cerebrata Azure Diagnostics Manager. (Great product btw)
DiagnosticMonitorConfiguration diagConfig = DiagnosticMonitor.GetDefaultInitialConfiguration();
diagConfig.Logs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;
It all appears to work remarkably well actually, however occasionally we see a Verbose message in the trace log which simply has "Fail"as the message. The code it appears to be generated from is wrapped in a try catch so it is odd that we aren't seeing the message through those means.
It would appear that something is happening that is out of our code's control, perhaps the worker role is being restarted, or the cloud op system is detecting a major error that only it can deal with by restarting our worker role. It recovers and carries on so it is somewhat of a mystery to us what might be happening.
What we haven't ascertained yet is whether we are losing a message.
Any help would be gratefully appreciated.
Cheers
Kindo Malay
Without the stack trace it's hard to say too much, but with the logging set to verbose it's quite likely that you're seeing some internal logging from one of the dlls you're using.
For example if you run a Azure Table query that causes certain kinds of errors, the error will be logged out 3 times because the storage client library is catching the error, tracing it out and then retrying.
If the error is not being caught by your try catch block, then it's likely nothing you need to worry about.
If deliverability of queue messages is important to you, you should ensure that you make use of the visibility timeout overload of CloudQueue.GetMessage and only delete the message when you've finished processing it. You may end up processing some messages twice, but at least you will process all of them.
If your role instance is getting restarted after running for a while, it's often because your process exited due to an unhandled exception.

Resources