In my nlog configuration, I've set
<targets async="true">
with the understanding that all logging now happens asynchronously to my application workflow. (and I have noticed a performance improvement, especially on the Email target).
This has me thinking about log sequence though. I understand that with async, one has no guarantee of the order in which the OS will execute the async work. So if, in my web app, multiple requests come in to the same method, each logging their occurrence to NLog, does this really mean that the sequence in which the events appear in my log target will not necessarily be the sequence in which the log method was called by the various requests?
If so, is this just a consequence of async that one has to live with? Or is there something I can do to keep have my logs reflect the correct sequence?
Unfortunately this is something you have to live with. If it is important to maintain the sequence you'll have to run it synchronously.
But if it is possible for you to manually maintain a sequence number in the log message, it could be a solution.
I know this is old and I'm just ramping up on NLog but if you see a performance increase for the email client, you may want to just assert ASYNC for the email target?
NLog will not perform reordering of LogEvent sequence, by activating <targets async="true">. It just activates an internal queue, that provides better handling of bursts and enables batch-writing.
If a single thread writes 1000 LogEvents then they will NOT become out-of-order, because of async-handling.
If having 10 threads each writing 1000 LogEvents, then their logging will mix together. But the LogEvents of an individual thread will be in the CORRECT order.
But be aware that <targets async="true"> use the overflowAction=Discard as default. See also: https://github.com/nlog/NLog/wiki/AsyncWrapper-target#async-attribute-will-discard-by-default
For more details about performance. See also: https://github.com/NLog/NLog/wiki/performance
Related
I have a web service that use Rebus as Service Bus.
Rebus is configured as explained in this post.
The web service is load balanced with a two servers cluster.
These services are for a production environment and each production machine sends commands to save the produced quantities and/or to update its state.
In the BL I've modelled an Aggregate Root for each machine and it executes the commands emitted by the real machine. To preserve the correct status, the Aggregate needs to receive the commands in the same sequence as they were emitted, and, since there is no concurrency for that machine, that is the same order they are saved on the bus.
E.G.: the machine XX sends a command of 'add new piece done' and then the command 'Set stop for maintenance'. Executing these commands in a sequence you should have Aggregate XX in state 'Stop', but, with multiple server/worker roles, you could have that both commands are executed at the same time on the same version of Aggregate. This means that, depending on who saves the aggregate first, I can have Aggregate XX with state 'Stop' or 'Producing pieces' ... that is not the same thing.
I've introduced a Service Bus to add scale out as the number of machine scales and resilience (if a server fails I have only slowdown in processing commands).
Actually I'm using the name of the aggregate like a "topic" or "destinationAddress" with the IAdvancedApi, so the name of the aggregate is saved into the recipient of the transport. Then I've created a custom Transport class that:
1. does not remove the messages in progress but sets them in state
InProgress.
2. to retrive the messages selects only those that are in a recipient that have no one InProgress.
I'm wandering: is this the best way to guarantee that the bus executes the commands for aggregate in the same sequence as they arrived?
The solution would be have some kind of locking of your aggregate root, which needs to happen at the data store level.
E.g. by using optimistic locking (probably implemented with some kind of revision number or something like that), you would be sure that you would never accidentally overwrite another node's edits.
This would allow for your aggregate to either
a) accept the changes in either order (which is generally preferable – makes your system more tolerant), or
b) reject an invalid change
If the aggregate rejects the change, this could be implemented by throwing an exception. And then, in the Rebus handler that catches this exception, you can e.g. await bus.Defer(TimeSpan.FromSeconds(5), theMessage) which will cause it to be delivered again in five seconds.
You should never rely on message order in a service bus / queuing / messaging environment.
When you do find yourself in this position you may need to re-think your design. Firstly, a service bus is most certainly not an event store and attempting to use it like one is going to lead to pain and suffering :) --- not that you are attempting this but I thought I'd throw it in there.
As for your design, in order to manage this kind of state you may want to look at a process manager. If you are not generating those commands then even this will not help.
However, given your scenario it seems as though the calls are sequential but perhaps it is just your example. In any event, as mookid8000 said, you either want to:
discard invalid changes (with the appropriate feedback),
allow any order of messages as long as they are valid,
ignore out-of-sequence messages till later.
Hope that helps...
"exactly the same sequence as they were saved on the bus"
Just... why?
Would you rely on your HTTP server logs to know which command actually reached an aggregate first? No because it is totally unreliable, just like it is with at-least-one delivery guarantees and it's also irrelevant.
It is your event store and/or normal persistence state that should be the source of truth when it comes to knowing the sequence of events. The order of commands shouldn't really matter.
Assuming optimistic concurrency, if the aggregate is not allowed to transition from A to C then it should guard this invariant and when a TransitionToStateC command will hit it in the A state it will simply get rejected.
If on the other hand, A->C->B transitions are valid and that is the order received by your aggregate well that is what happened from the domain perspective. It really shouldn't matter which command was published first on the bus, just like it doesn't matter which user executed the command first from the UI.
"In my scenario the calls for a specific aggregate are absolutely
sequential and I must guarantee that are executed in the same order"
Why are you executing them asynchronously and potentially concurrently by publishing on a bus then? What you are basically saying is that calls are sequential and cannot be processed concurrently. That means everything should be synchronous because there is no potential benefit from parallelism.
Why:
executeAsync(command1)
executeAsync(command2)
executeAsync(command3)
When you want:
execute(command1)
execute(command2)
execute(command3)
You should have a single command message and the handler of this message executes multiple commands against the aggregate. Then again, in this case I'd just create a single operation on the aggregate that performs all the transitions.
I'm working on what's basically a highly-available distributed message-passing system. The system receives messages from someplace over HTTP or TCP, perform various transformations on it, and then sends it to one or more destinations (also using TCP/HTTP).
The system has a requirement that all messages sent to a given destination are in-order, because some messages build on the content of previous ones. This limits us to processing the messages sequentially, which takes about 750ms per message. So if someone sends us, for example, one message every 250ms, we're forced to queue the messages behind each other. This eventually introduces intolerable delay in message processing under high load, as each message may have to wait for hundreds of other messages to be processed before it gets its turn.
In order to solve this problem, I want to be able to parallelize our message processing without breaking the requirement that we send them in-order.
We can easily scale our processing horizontally. The missing piece is a way to ensure that, even if messages are processed out-of-order, they are "resequenced" and sent to the destinations in the order in which they were received. I'm trying to find the best way to achieve that.
Apache Camel has a thing called a Resequencer that does this, and it includes a nice diagram (which I don't have enough rep to embed directly). This is exactly what I want: something that takes out-of-order messages and puts them in-order.
But, I don't want it to be written in Java, and I need the solution to be highly available (i.e. resistant to typical system failures like crashes or system restarts) which I don't think Apache Camel offers.
Our application is written in Node.js, with Redis and Postgresql for data persistence. We use the Kue library for our message queues. Although Kue offers priority queueing, the featureset is too limited for the use-case described above, so I think we need an alternative technology to work in tandem with Kue to resequence our messages.
I was trying to research this topic online, and I can't find as much information as I expected. It seems like the type of distributed architecture pattern that would have articles and implementations galore, but I don't see that many. Searching for things like "message resequencing", "out of order processing", "parallelizing message processing", etc. turn up solutions that mostly just relax the "in-order" requirements based on partitions or topics or whatnot. Alternatively, they talk about parallelization on a single machine. I need a solution that:
Can handle processing on multiple messages simultaneously in any order.
Will always send messages in the order in which they arrived in the system, no matter what order they were processed in.
Is usable from Node.js
Can operate in a HA environment (i.e. multiple instances of it running on the same message queue at once w/o inconsistencies.)
Our current plan, which makes sense to me but which I cannot find described anywhere online, is to use Redis to maintain sets of in-progress and ready-to-send messages, sorted by their arrival time. Roughly, it works like this:
When a message is received, that message is put on the in-progress set.
When message processing is finished, that message is put on the ready-to-send set.
Whenever there's the same message at the front of both the in-progress and ready-to-send sets, that message can be sent and it will be in order.
I would write a small Node library that implements this behavior with a priority-queue-esque API using atomic Redis transactions. But this is just something I came up with myself, so I am wondering: Are there other technologies (ideally using the Node/Redis stack we're already on) that are out there for solving the problem of resequencing out-of-order messages? Or is there some other term for this problem that I can use as a keyword for research? Thanks for your help!
This is a common problem, so there are surely many solutions available. This is also quite a simple problem, and a good learning opportunity in the field of distributed systems. I would suggest writing your own.
You're going to have a few problems building this, namely
2: Exactly-once delivery
1: Guaranteed order of messages
2: Exactly-once delivery
You've found number 1, and you're solving this by resequencing them in redis, which is an ok solution. The other one, however, is not solved.
It looks like your architecture is not geared towards fault tolerance, so currently, if a server craches, you restart it and continue with your life. This works fine when processing all requests sequentially, because then you know exactly when you crashed, based on what the last successfully completed request was.
What you need is either a strategy for finding out what requests you actually completed, and which ones failed, or a well-written apology letter to send to your customers when something crashes.
If Redis is not sharded, it is strongly consistent. It will fail and possibly lose all data if that single node crashes, but you will not have any problems with out-of-order data, or data popping in and out of existance. A single Redis node can thus hold the guarantee that if a message is inserted into the to-process-set, and then into the done-set, no node will see the message in the done-set without it also being in the to-process-set.
How I would do it
Using redis seems like too much fuzz, assuming that the messages are not huge, and that losing them is ok if a process crashes, and that running them more than once, or even multiple copies of a single request at the same time is not a problem.
I would recommend setting up a supervisor server that takes incoming requests, dispatches each to a randomly chosen slave, stores the responses and puts them back in order again before sending them on. You said you expected the processing to take 750ms. If a slave hasn't responded within say 2 seconds, dispatch it again to another node randomly within 0-1 seconds. The first one responding is the one we're going to use. Beware of duplicate responses.
If the retry request also fails, double the maximum wait time. After 5 failures or so, each waiting up to twice (or any multiple greater than one) as long as the previous one, we probably have a permanent error, so we should probably ask for human intervention. This algorithm is called exponential backoff, and prevents a sudden spike in requests from taking down the entire cluster. Not using a random interval, and retrying after n seconds would probably cause a DOS-attack every n seconds until the cluster dies, if it ever gets a big enough load spike.
There are many ways this could fail, so make sure this system is not the only place data is stored. However, this will probably work 99+% of the time, it's probably at least as good as your current system, and you can implement it in a few hundred lines of code. Just make sure your supervisor is using asynchronous requests so that you can handle retries and timeouts. Javascript is by nature single-threaded, so this is slightly trickier than normal, but I'm confident you can do it.
we have an ETL scenario where we use the resequencer.
Messages arrive to the flow with a sequence number that the resequencer uses it to send messages in order, but sometimes messages are discarded previously (because of data validation) and do not arrive to the resequencer. This produces holes in the sequence and resequencer stops sending messages using the default release strategy. To avoid this, we developed a new SequenceTimeoutReleaseStrategy that is a mix between default strategy and TimeoutCountSequenceSizeReleaseStrategy from SI. When a message arrives, it checks the timeout and release it if necesary.
All this worked well unless for the last messages that arrive before the timeout and have holes. This messages aren't release by the strategy. We could use a reaper but the secuence may have more than one hole in the sequence so when the resequencer release them it will stop in the first sequence break and remove the group losing the rest of the messages. So, the question is: is there a way to use the resequencer where there can be holes in the sequence?
One solution we have and want to avoid is having a scheduled tasks that removes the messages directly from the message store, but this could be a problem with concurrency and so on, so we prefer other solutions.
Any help is appreciated here
Regards
Guzman
There are two components involved; the release strategy says "something" can be released; the actual decision as to what is released is performed by the MessageGroupProcessor. In this case, a ResequencingMessageGroupProcessor.
You would need to customize that class to "skip" the hole(s).
You can't wire in a customized MGP using the <reseequencer/> namespace, you would have to wire up using <bean/> s - a ResequencingMessageHandler and a ConsumerEndpointFactoryBean.
Or use a BeanFactoryPostProcessor to change the constructor argument to your custom class.
I'm using the each() method of the async lib and experiencing some very odd (and inconsistent) errors that appear to be File handle errors when I attempt to log to file from within the child processes.
The array that I'm handing to this method frequently has hundreds of items and I'm curious if Node is having trouble running out of available file handles as it tries to log to file from within all these simultaneous processes. The problem goes away when I comment out my log calls, so it's definitely related to this somehow, but I'm having a tough time tracking down why.
All the logging is trying to go into a single file... I'm entirely unclear on how that works given that each write (presumably) blocks, which makes me wonder how all these simultaneous processes are able to run independently if they're all sitting around waiting on the file to become available to write to.
Assuming that this IS the source of my troubles, what's the right way to log from a process such as Asnyc.each() which runs N number of processes at once?
I think you should have some adjustable limit to how many concurrent/outstanding write calls you are going to do. No, none of them will block, but I think async.eachLimit or async.queue will give you the flexibility to set the limit low and be sure things behave and then gradually increase it to find out what resource constraints you eventually bump up against.
What are the best practices when using node.js for a queue processing application?
My main concern there would be that Node processes can handle thousands of items at once, but that a rogue unhandled error in any of them could bring down the whole process.
I'd be looking for a queue/driver combination that allowed a two-phase commit (wrong terminology I think?), i.e:
Get the next appropriate item from the queue (which then blocks that item from being consumed elsewhere)
Once each item is handed over to the downstream service/database/filesystem you can then tell the queue that the item has been processed
I'd also want repeatably unique identifiers so that you can reliably detect if an item comes down the pipe twice. In a theoretical system it might not happen, but in a practical environment the capability to deal with it will make your life easier.
check out http://learnboost.github.com/kue/ i have used it for a couple of pet projects and works quite good, you can look at their source and check what practices they have take care of