generate event after all messages have been processed

generate event after all messages have been processed - spring-integration

Is there a built in way in spring integration to generate an event when all messages has been processed in a context. For example, i have a file picked up by a FileReadingMessageSource and we split all rows in the file and the ask is to send an alert when all rows has been processed or connect some other dependent processing. Basically ability to wire a post processing component.

The FileSplitter has an ability to emit start and end file markers. You can route them respectively and turn desired end message to the event. There is respective event publisher channel adapter , too.
See docs for more info:
https://docs.spring.io/spring-integration/docs/current/reference/html/file.html#file-splitter
https://docs.spring.io/spring-integration/docs/current/reference/html/message-routing.html#messaging-routing-chapter
https://docs.spring.io/spring-integration/docs/current/reference/html/event.html#applicationevent
As a general note: there is no built-in feature to determine the end of the messages flow. It is just an infinite stream. And mostly messages are not aware of each other and don’t affect each other. Only particular task and specific business message can have such an indicator. Therefore it is good that you have shared that file splitting. Other use-cases might not come with end marker message .

Related

Control Azure Service Bus Queue Message Reception

We have a distributed architecture and there is a native system which needs to be called. The challenge is the capacity of the system which is not scalable and cannot take on more load of requests at same time. We have implemented Service Bus queues, where there is a Message handler listening to this queue and makes a call to the native system. The current challenge is whenever a message posted in the queue, the message handler is immediately processing the request. However, We wanted to have a scenario to only process two requests at a time. Pick the two, process it and then move on to the next two. Does Service Bus Queue provide inbuilt option to control this or should we only be able to do with custom logic?
var options = new MessageHandlerOptions()
{
MaxConcurrentCalls = 1,
AutoComplete = false
};
client.RegisterMessageHandler(
async (message, cancellationToken) =>
{
try
{
//Handler to process
await client.CompleteAsync(message.SystemProperties.LockToken);
}
catch
{
await client.AbandonAsync(message.SystemProperties.LockToken);
}
}, options);

Message Handler API is designed for concurrency. If you'd like to process two messages at any given point in time then the Handler API with maximum concurrency of two will be your answer. In case you need to process a batch of two messages at any given point in time, this API is not what you need. Rather, fall back to building your own message pump using a lower level API outlined in the answer provided by Mikolaj.
Careful with re-locking messages though. It's not a guaranteed operation as it's a client-side operation and if there's a communication network, currently, the broker will reset the lock and the message will be processed again by another competing consumer if you scale out. That is why scaling-out in your scenario is probably going to be a challenge.
Additional point is about lower level API of the MessageReceiver when it comes to receiving more than a single message - ReceiveAsync(n) does not guarantee n messages will be retrieved. If you absolutely have to have n messages, you'll need to loop to ensure there are n and no less.
And the last point about the management client and getting a queue message count - strongly suggest not to do that. The management client is not intended for frequent use at run-time. Rather, it's uses for occasional calls as these calls are very slow. Given you might end up with a single processing endpoint constrained to only two messages at a time (not even per second), these calls will add to the overall time to process.

From the top of my head I don't think anything like that is supported out of the box, so your best bet is to do it yourself.
I would suggest you look at the ReceiveAsync() method, which allows you to receive specific amount of messages (NOTE: I don't think it guarantees that if you specify that you want to retrieve 2 message it will always get you two. For instance, if there's just one message in the queue then it will probably return that one, even though you asked for two)
You could potentially use the ReceiveAsync() method in combination with PeekAsync() method where you can also provide a number of messages you want to peek. If the peeked number of messages is 2 than you can call ReceiveAsync() with better chances of getting desired two messages.
Another way would be to have a look at the ManagementClient and the GetQueueRuntimeInfoAsync() method of the queue, which will give you the information about the number of messages in the queue. With that info you could then call the ReceiveAsync() mentioned earlier.
However, be aware that if you have multiple receivers listening to the same queue then there's no guarantees that anything from above will work, as there's no way to determine if these messages were received by another process or not.
It might be that you will need to go with a more sophisticated way of handling this and receive one message, then keep it alive (renew lock etc.) until you get another message and then process them together.
I don't think I helped too much but maybe at least it will give you some ideas.

Message Collapsing

I'm trying to determine if there's a way for Azure Service Bus to provide message collapsing. Specifically I'm after something like:
First event into a queue gets picked up straight away
All other events that are queued within the next N seconds, and match some criteria (e.g. matching message ids), have the schedule enqueue set to a value so they fire at the end of the N seconds. If a "waiting" message already exists it should be deleted.
After the N seconds has expired the newest scheduled message appears and is picked up.
Basically I need a way to get a good time-to-first-event, but provide protection from over processing events from chatty sources.
Does anyone have a pattern they've used to get something close to these semantics?
Update 1
The messages involved aren't true duplicates, rather they're the current state of an entity that is used for some processing (e.g. a message that's generated each time a file is updated). The result of the processing of an early message is fully replaced by that of later messages (e.g. the result is the size of the file). So we still need to guarantee we process the most recent message, but it's a waste to process all M within N seconds.

It sounds like you're talking about Duplicate Detection, especially in regards to matching MessageIds. If you want to evaluate some other attribute in the message for duplicate detection, maybe it's worth taking a step back and asking Why are my publishers sending so many duplicate messages? If it's unavoidable, maybe you can segregate your chatty consumers into a separate consumer group and manually handle the the duplicate check, then re-enqueue (just thinking out loud).

Give each message a custom delay (rabbitmq)?

I need to delay each message I produce with a specific time.
As far as I know the rabbitmq-delayed-message-exchange plugin allows me to do exactly that, however I was warned that it doesn't scale properly which is a definite requirement. (Has there been any updates lately fixing scaling problems?)
So, the alternative was to use TTL and a DLQ. With this approach though, you set the time when creating the exchange instead of the actual message which means I wouldn't be able to set different times for different messages.
Did I miss something?
My use case: Basicly I will be receiving specific "appointments" from clients which I must store and send back to the client at a specific time supplied in the appointment object. I want to acheive this by specifying a delay on each message so that my consumers must not implement waiting logic.

Why don't you use a per-queue message TTL, and have different queues for each different TTL you want to set, originally publish the messages through direct exchange with key related to the specific TTL?
Then having configured the same dead letter exchange for all those queues, they'll end up in the "final" queue for your consumers with the desired delay.
Of course it wouldn't be great if the possible values for the delays were too numerous.

Event Sourcing with Side-Effects

I'm building a service using the familiar event sourcing pattern:
A request is received.
The aggregate's history is loaded.
The aggregate is rebuilt (from its history).
New events are prepared and the aggregate is updated in response to the incoming request from Step 1.
These events are written to the log, and are made available (published) to any subscribers.
In my case, Step 5 is accomplished in two parts. The events are written to the event log. A background process reads from the event log and publishes all events starting from an offset.
In some cases, I need to publish side effects in addition to events related to the aggregate. As far as the system is concerned, these are events too because they are consumed by and affect the state of other services. However, they don't affect the history of the aggregate in this service and are not needed to rebuild it.
How should I handle these in the code?
Option 1-
Don't write side-effecting events to the event log. Publish these in the main process prior to Step 5.
Option 2-
Write everything to the event log and ignore side-effecting events when the history is loaded. (These aren't part of the history!)
Option 3-
Write side-effecting events to a dummy aggregate so they are published, but never loaded.
Option 4-
?
In the first option, there may be trouble if there is a concurrency violation. If the write fails in Step 5, the side effect cannot be easily rolled back. The second option write events that are not part of the aggregate's history. When loading in Step 2, these side-effecting events would have to be ignored. The 3rd option feels like a hack.
Which of these seems right to you?

Name events correctly
Events are "things that happened". So if you are able to name the events that only trigger side effects in a "X happened" fashion, they become a natural part of the event history.
In my experience, this is always possible, because side-effects don't happen out of thin air. Sometimes the name becomes a bit artificial, but it is still better to name events that way than to call them e.g. "send email to that client event".
In terms of your list of alternatives, this would be option 2.
Example
Instead of calling an event "send status email to customer event", call it "status email triggered event". Of course, if there is a better name for the actual trigger, use that one :-)

Option 4 - Have some other service subscribe to the events and produce the side effects, and any additional events related to them.
Events should be fine-grained.

Option 1- Don't write side-effecting events to the event log. Publish
these in the main process prior to Step 5.
What if you later need this part of the history by building a new bounded context?
Option 2- Write everything to the event log and ignore side-effecting
events when the history is loaded. (These aren't part of the history!)
How to ignore the effect of something which does not have any effect? :D
Option 3- Write side-effecting events to a dummy aggregate so they are
published, but never loaded.
Why do you need consistency boundary around something which you will never change?
What you are talking about is the most common form of domain events, which you use to communicate with other BC-s. Ofc. you need to save them.

resequencer with holes on secuence

we have an ETL scenario where we use the resequencer.
Messages arrive to the flow with a sequence number that the resequencer uses it to send messages in order, but sometimes messages are discarded previously (because of data validation) and do not arrive to the resequencer. This produces holes in the sequence and resequencer stops sending messages using the default release strategy. To avoid this, we developed a new SequenceTimeoutReleaseStrategy that is a mix between default strategy and TimeoutCountSequenceSizeReleaseStrategy from SI. When a message arrives, it checks the timeout and release it if necesary.
All this worked well unless for the last messages that arrive before the timeout and have holes. This messages aren't release by the strategy. We could use a reaper but the secuence may have more than one hole in the sequence so when the resequencer release them it will stop in the first sequence break and remove the group losing the rest of the messages. So, the question is: is there a way to use the resequencer where there can be holes in the sequence?
One solution we have and want to avoid is having a scheduled tasks that removes the messages directly from the message store, but this could be a problem with concurrency and so on, so we prefer other solutions.
Any help is appreciated here
Regards
Guzman

There are two components involved; the release strategy says "something" can be released; the actual decision as to what is released is performed by the MessageGroupProcessor. In this case, a ResequencingMessageGroupProcessor.
You would need to customize that class to "skip" the hole(s).
You can't wire in a customized MGP using the <reseequencer/> namespace, you would have to wire up using <bean/> s - a ResequencingMessageHandler and a ConsumerEndpointFactoryBean.
Or use a BeanFactoryPostProcessor to change the constructor argument to your custom class.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string