I have an NServiceBus configuration that is working great on developers machines and in my Development Environment.
However, when I move it to my Test Environment my messages just start getting tossed.
Here is the system:
An app gets a TCP message from a Mainframe system and sends it to a MSMQ (call it FromMainframe).
An application hosted in IIS has a "Handle" method for that MSMQ and processes the messages from the mainframe.
In my Test Environment, step two only half way happens. The message is popped off the MSMQ, but not processed by my application.
Effectively my data is LOST! NServiceBus removes them from the Queue but I never get to process them. They are not even in the error queue!
These are the things I have tried in an attempt to figure out what is happening:
Check the Config files
Attach a remote debugger to the process to see what the Handle method is doing
The Handle method is never called (but when I attach to the Development Environment my breakpoint in my Handle method is hit and it all works flawlessly).
Redeploy my Dev version to the Test Envioronment and try step 2 again (just in case the versions were not exactly the same.)
Check the Config files again
Check that the Error queue is not filling up
The error queue stays empty (I wish it would fill up, then my data would not be LOST).
Check for any other process that may be pulling stuff from my MSMQs
I Turned off my IIS website and the messages in the FromMainframe queue start to backup.
When I turn it back on, the messages disappear fairly fast (but still not all at once). The speed that they disappear is too fast for them to be processed by my Handle method.
Check Config files yet again.
Run the NServiceBusTools\MsmqUtils\Runner.exe \i
I ran it, rebooted, ran it again and again for good measure!
Check the Configs again (I must have missed SOMETHING right?)
Check the Development Environment Configs are not pointing to the Test Environment
I don't think it is possible to use another computer's MSMQ as your input queue, but it does not hurt to check.
Look for any catch blocks that could be silently killing my message.
One last check of the Config files.
Recreate my Test Environment on another machine (it worked flawlessly)
Run my stuff outside of IIS.
When I host outside of IIS (using NServiceBus.Host.exe) it all works fine. So it has to be an IIS thing right?
Go crazy and hope that stack overflow can offer any kind of insight.
So I know enough about what happened to throw out an "Answer".
When I setup my NServiceBus self hosting I had a call that loaded the message handlers.
NServiceBus.Configure.With().LoadMessageHandlers()
(There are more configurations, but I omitted them for brevity)
When you call this, NServiceBus scans the assmeblies for a class that implements IHandleMessages<T>.
So, somehow, on my Test Environment Machine, the ServiceBus scan of the directory for a class that calls IHandleMessages was failing to find my class (even though the assembly was absolutely there).
Turns out that if NServiceBus does not find something that handles a message it will THROW IT AWAY!!!
This is a total design bug in my opinion. The whole idea of NServiceBus is to not lose your data, but in this case it does just that!
Now, once you know about this pitfall, there are several ways around it.
Expressly state what your handler(s) should be:
NServiceBus.Configure.With().LoadMessageHandlers<First<MyMessageType>>()
Even further protection is to add another handler that will handle "Everything else". IMessage is the base for all message payloads, so if you put a handler on it, it will pickup everything.
If you set IMessage to handle after your messages get handled, then it will handle everything that NServiceBus can't find a handler for. If you throw and exception in that Handle method that will cause NServiceBus to to move the message to the error queue. (What I think should be the default behavior.)
Related
Context
In an ASP.NET Core application I would like to execute an operation which takes say 5 seconds (like sending email). I do know async/await and its purpose in ASP.NET Core, however I do not want to wait the end of the operation, instead I would like to return back to the to the client immediately.
Issue
So it is kinda Fire and Forget either homebrew, either Hangfire's BackgroundJob.Enqueue<IEmailSender>(x => x.Send("hangfire#example.com"));
Suppose I have some more complex method with injected ILogger and other stuff and I would like to Fire and Forget that method. In the method there are error handling and logging.(note: not necessary with Hangfire, the issue is agnostic to how the background worker is implemented). My problem is that method will run completely out of context, probably nothing will work inside, no HttpContext (I mean HttpContextAccessor will give null etc) so no User, no Session etc.
Question
How to correctly solve say this particular email sending problem? No one wants wait with the response 5 seconds, and the same time no one wants to throw and email, and not even logging if the send operation returned with error...
How to correctly solve say this particular email sending problem?
This is a specific instance of the "run a background job from my web app" problem.
there is no universal solution
There is - or at least, a universal pattern; it's just that many developers try to avoid it because it's not easy.
I describe it pretty fully in my blog post series on the basic distributed architecture. I think one important thing to acknowledge is that since your background work (sending an email) is done outside of an HTTP request, it really should be done outside of your web app process. Once you accept that, the rest of the solution falls into place:
You need a durable storage queue for the work. Hangfire uses your database; I tend to prefer cloud queues like Azure Storage Queues.
This means you'll need to copy all the data over that you will need, since it needs to be serialized into that queue. The same restriction applies to Hangfire, it's just not obvious because Hangfire runs in the same web application process.
You need a background process to execute your work queue. I tend to prefer Azure Functions, but another common approach is to run an ASP.NET Core Worker Service as a Win32 service or Linux daemon. Hangfire has its own ad-hoc in-process thread. Running an ASP.NET Core hosted service in-process would also work, though that has some of the same drawbacks as Hangfire since it also runs in the web application process.
Finally, your work queue processor application has its own service injection, and you can code it to create a dependency scope per work queue item if desired.
IMO, this is a normal threshold that's reached as your web application "grows up". It's more complex than a simple web app: now you have a web app, a durable queue, and a background processor. So your deployment becomes more complex, you need to think about things like versioning your worker queue schema so you can upgrade without downtime (something Hangfire can't handle well), etc. And some devs really balk at this because it's more complex when "all" they want to do is send an email without waiting for it, but the fact is that this is the necessary step upwards when a baby web app becomes distributed.
I'm dealing with Node for quite a while now, and something keeps annoying me on my production environment: debugging!
So I thought about a system that would be as following:
An error occurs with a certain level or an uncaught one.
Log a super long stack trace related to the request, containing all function calls AND variable values since the request happened.
Send that to a service or a simple log file (monitored) that would inform me that an error happened but with a clear idea of the context.
I don't know how to do something like that or if there are some existing stuff out there doing that job.
My strategy for now, long-stack-trace when an error occurs and crash the worker that will be restarted by the cluster parent (only responsible for redirecting HTTP requests and monitor children)
Thanks!
I'm currently creating a continous webjob that will do polling to an API, and then forward messages to an Azure Service Bus. I've managed to get this to work just fine, but I have one problem; what if my app crashes for whatever reason? What if there's an uncaught exception, or something goes wrong, and the app stops running. How do i get it to run again?
I created a test app, which will send a message every to the Service Bus, then on the 11th message it will crash due to an intentionally placed NullReferenceException. I did this in order to investigate behaviour whenever/if the app crashes.
What happens is that the app runs just fine for the first 10 seconds (as expected). Messages are being sent, and everything looks good. Then after the 10th second, when the exception occurs, nothing happens. No log in Azure saying there was an exception, no reboot - nothing. It just stands there as "running", but messages are no longer being sent.
How do I deal with this? It's essential that the application is able to reboot if it fails. Are there any standard ways to do this? Best practices?
Any help would be appreciated :)
It is always good to handle most of the failure scenarios in the system by ourselves rather than to let the hosting environment to react for the failures.
My suggestion would be to have a check in the code for exceptions like any try catch block in your executable script to catch different kind of failure scenarios and instead of throwing the exceptions, log it your self or take any retry operation if required.
Example, when you got a junk data to process and it failed. Then you can try to do the operation again for eg. 3 times and then finally push a log to deadletter account to manually take care of such junk inputs. And don't let the flow be stopped by throwing the exception but instead handle it your self by logging a message which needs manual intervention.
In any GUI or Web applications, if there is an exception then the flow is re initiated by user click and system will respond. But here as it a background processor, it is ideal to avoid all such control flow blockers.
Hope this would help.
I recently created an error manager to take logged errors from clients on our network and put them into an MSMQ for processing. I have a separate Windows Service running on the server to pick items off the queue and push them into a database.
When I wrote it and tested it everything worked great; however I neglected to consider that at deploy-time, having 100 clients all sending to a public queue might not be performant, best-case, and worst-case there could be all kinds of collisions, it seems to me.
My thought right now is to front the MSMQ with a WCF service and make everyone go through that. The logic being that at that point I could employ some locking, etc. If I went with a service I think I could employ a private queue instead of a public one, which would be tons faster, as well.
What I'm not sure is, am I overthinking it? MSMQ is pretty robust and the methods I think are thread-safe. Should I just leave it alone and see what happens? If I do put in the service, how much management would I need to have in place?
I recently created an error manager to take logged errors from clients
on our network and put them into an MSMQ for processing
I assume you're using System.Messaging for this? If so there is nothing at all wrong with your approach.
having 100 clients all sending to a public queue might not be
performant
MSMQ was designed from the bottom up to handle high load. Depending on the size of the individual messages and the storage threshold of the machine, a queue can hold 10's of thousand of messages without any noticeable performance impact.
Because a "send" in MSMQ involves the queue manager on each machine writing messages locally before transmission (in a store and forward messaging pattern), there is almost no chance of "collisions" or any other forms of contention happening; if the sender is unable to transmit the message it simply "sends" it to a temporary local queue and then the actual transmission happens in the background and is mediated by the fault tolerant and very reliable msmq protocol.
My thought right now is to front the MSMQ with a WCF service and make
everyone go through that
This would be a valid choice if you were starting from nothing. As another poster has stated, WCF does hide you from some of the msmq-voodoo by removing the necessity to use System.Messaging. However, you've already written the code so I see little benefit exposing a netMsmqBinding endpoint.
If I went with a service I think I could employ a private queue
instead of a public one
As far as I understand it from your description, there's nothing to stop you using a private queue in your current scenario. In fact I'd recommend always using private queues as they're much simpler.
If I do put in the service, how much management would I need to have
in place?
You will have more management overhead with a wcf service. Because you're wrapping each end of a send-receive with the WCF stack, there is more code to spin up and therefore potentially fail. WCF stack exceptions are famously difficult to troubleshoot without full service logging enabled.
EDIT - in response to comments
I think for a private queue you have to actually be writing FROM the
machine the queue sits on, which would not work in a networked
environment
Untrue. MSMQ supports transactional reads to and writes from any private queue, regardless of whether the queue is local or remote.
This is because any time a message is sent from one machine to another in msmq, regardless of the queue address, the following happens:
Queue manager on sending machine writes the message to a temporary local "outbound" queue.
Queue manager on sending machine contacts queue manager on receiving machine and transmits the message.
Queue manager on receiving machine places the message into the destination queue.
If you are using transactions, the above steps will comprise 3 distinct transactions.
Something to remember: the safest paradigm in exchanging messages between queues on different machines is send remote, read local.
So this means when you send a message, you're instructing msmq to send to a remote queue address. However, when someone sends something to you, they must do the same. So you end up reading only from local queues, and sending only to remote queues.
This way you get the most reliable messaging setup, because when reading, a local queue will always be available.
Try it! I've been using msmq for cross machine communication for nearly 10 years and I've never used a public queue. I don't even know what they're for!
I would expose an WCF "IsOneWay" method.
And then host your WCF in IIS.
The IsOneWay will wire up to MSMQ.
This way...you have the robustness of IIS hosting. You can expose any endpoint you want.
But eventually the request makes it to MSMQ.
One of hte reasons is the ease of using msmq with wcf. Having written and used msmq "pre-wcf" I found the code (pulling messages off the queue and error handling) to be difficult and problematic. That alone would push me to WCF hosting.
And as you mention, the security around a local-queue is much easier to deal with.
Bottom line, let WCF handle the msmq-voodoo for you.
Simple example below.
[ServiceContract]
public interface IMyControllerController
{
[OperationContract(IsOneWay = true)]
void SubmitRequest( MyObject obj );
}
http://msdn.microsoft.com/en-us/library/ms733035%28v=vs.110%29.aspx
http://msdn.microsoft.com/en-us/library/system.servicemodel.operationcontractattribute.isoneway%28v=vs.110%29.aspx
What happens in WCF to methods with IsOneWay=true at application termination
http://blogs.msdn.com/b/tomholl/archive/2008/07/12/msmq-wcf-and-iis-getting-them-to-play-nice-part-1.aspx
In the official NodeJS documentation there is code example where process tries to exit gracefully when there was exception in domain (it closes connections, waits for some time for other requests and then exits).
But why just not send the 500 error and continue to work?
In my application I want to throw some expected Errors (like FrontEndUserError) when user input is not valid, and catch these exceptions somewhere in middleware to send pretty error message to client. With domains it very easy to implement, but are there any pitfalls around this?
app.use (err, req, res, next) ->
if err instanceof FrontEndUserError
res.send {error: true, message: err.message}
else
log err.trace
res.send 500
From domain module official documentation:
By the very nature of how throw works in JavaScript, there is almost never any way to safely "pick up where you left off", without leaking references, or creating some other sort of undefined brittle state.
The safest way to respond to a thrown error is to shut down the process ...
To me that means when something thrown an error in your NodeJS application, then you're done unfortunately. If you do care about how your application works and the result is important to you, then your best bet is to kill the process and start it again. However, in that last milliseconds, you can be more nice to other clients, let them finish their work, say sorry to new clients, log couple of things if you want and then kill the process and start it again.
That's what exactly happening in the NodeJS domain module's documentation example.
Let's look at your web application/server as a state machine.
Unless your application is very small, it is very unlikely that you happen to know every state that your machine can possibly be in.
When you get an error, you have two choices:
1) Examine the error and decide what to do, or
2) ignore it.
In the first case, you gracefully change from one state to another. In the second case, you don't have any clue what state your machine is in, since you didn't bother seeing what the error was. Essentially, your machine's state is now 'undefined'.
It is for this reason, NodeJS recommends killing the process if an error propagates all the way to the event loop. Then again, this level of absolution may be overkill for pet projects and small apps, so your solution is quite fine too.
But imagine if you were writing a banking software; someday you get an error you've never seen before, you app simple ignores it and sends a 500; but each time someone is losing a 100k$. Here, I would want to make sure no error ever reaches the event loop, and if it does, kill the process with a detailed stack trace for later analysis.