Transaction boundaries without using pollers - spring-integration

Our project has the following flow pattern:
<input-flow> | <routing-flow> | <output-flow>
Where the pipes symbolize the transaction boundaries and all flows are multi threaded using TaskExecutors. In the input-flow, the transaction is started by the message-driven-channel-adapter, but in the routing-flow and output-flow it is currently started by a poller which causes latency.
To avoid the poller latency, I would like to create the transaction boundaries using ExecutorChannels, but the ExecutorChannel does not start a transaction for the flow.
Are there other possibilities to achieve this?

You can avoid the latency by reducing the polling interval (even to 0) and increasing the receive timeout (at the expense of tying up a scheduler thread to wait for messages).
For an executor channel, you can insert a transactional gateway in the flow (see this answer for an example, or use AOP to start the transaction on a direct channel send() somewhere downstream of the executor.

Related

Thread Sleep in the Kafka Listener

I am trying to pause/resume the Kafka container. Using the following code snippet to do so:
kafkaListenerEndpointRegistry.getListenerContainer("MAIN").pause();
When I call the pause, I also need to do a thread.sleep so that messages in the batch are not processed. For every message in the batch, I am calling another API which has a rate limit. To maintain this rate limit, I need to stop the processing for the message.
If the Main thread sleeps, will it stop Listener from sending the hearbeat? Does it also stop the heartbeat thread in the background?
Documentation says , "When a container is paused, it continues to poll() the consumer, avoiding a rebalance if group management is being used, but it does not retrieve any records. "
But I am pausing the container and making the thread sleep. How will this impact the flow?
You must never sleep the consumer thread, to avoid rebalancing.
Instead, reduce the max.poll.records so the pause will take effect more quickly (the consumer won't actually pause until the records received by the previous poll are processed).
You can throw an exception after pausing the consumer, but you will need to resume the container somehow.
I opened a new issue to improve this behavior https://github.com/spring-projects/spring-kafka/issues/2280
If you are subject to rate limits, consider using KafkaTemplate.receive() methods, on a schedule, or a polled Spring Integration adapter, instead of using a message-driven approach.

Timeout error when creating ServiceBusMessageBatch in Azure.Messaging.ServiceBus

I have the following code where I start getting an error during long-running tests on the same Service Bus Client.
ServiceBusMessageBatch batch = this._serviceBusSender.CreateMessageBatchAsync().GetAwaiter().GetResult();
The error is,
Azure.Messaging.ServiceBus.ServiceBusException: 'The operation did not complete within the allocated time 00:01:00 for object request42. (ServiceTimeout)'
Why is this statement throwing this error? Is the creation of a batch object such a heavy operation that it can even timeout? If this is the case, should I switch to the overload of using the List of ServiceBusMessage instead of this batch mode?
My understanding is that this way of batch creation can protect me from creating a batch that the queue may not allow. I am finding it difficult to understand why it times out after 1 min
.
In order for a batch to be able to enforce limits on the size, it has to establish an AMQP link to the entity that you'll be sending to and read the maximum allowable message size from the service. This results in a network operation that, in this case, timed out. This overhead is performed only in the case that there is not an existing AMQP link already established - typically on the first call that requires a network operation.
What jumps out at me from your code is the use of GetAwaiter().GetResult() to perform sync-over-async. This is really not a good idea and is very likely to cause contention in the thread pool that prevents continuations from being scheduled in a timely manner. Because network operations in Service Bus are asynchronous - including establishing the AMQP link - delays in scheduling continuations would certainly increase the chance of timeouts.
I'd strongly advise refactoring your sync-over-async code paths and shifting to an asynchronous approach. In those scenarios where it's not possible to go full async, limiting sync-over-async to the outermost layer of your code would be the next best thing.

EventProcessorClient - AmqpRetryOptions options behaviour

Here is our current scenario - Listen to all the partitions on a given event hub and logically process each message based on the content and re-process (until configured no. of times - say 3) the same event if the initial processing fails internally.
Following the recommended guidelines for higher throughput, we are planning to use EventProcessorClient (Azure SDK for consuming the events from Azure Event Hub. Using the EventProcessorClientBuilder, its initialized based on the docs.
EventProcessorClient eventProcessorClient = new EventProcessorClientBuilder()
.consumerGroup("consumer-group")
.checkpointStore(new BlobCheckpointStore(blobContainerAsyncClient))
.processEvent(eventContext -> {
// process eventData and throw error if the internal logic fails
})
.processError(errorContext -> {
System.out.printf("Error occurred in partition processor for partition {}, {}",
errorContext.getPartitionContext().getPartitionId(),
errorContext.getThrowable());
})
.connectionString(connectionString)
.retry(new AmqpRetryOptions()
.setMaxRetries(3).setMode(AmqpRetryMode.FIXED).setDelay(Duration.ofSeconds(120)))
.buildEventProcessorClient();
eventProcessorClient.start();
However, the retry logic doesn't kick in at all, checking the documentation further - I wonder if this is only applicable for the explicit instance for EventHubAsyncClient. Any suggestions or inputs on what is required to achieve this retry capability ?
Thanks in advance.
re-process (until configured no. of times - say 3) the same event if the initial processing fails internally.
That is not how retries work with the processor. The AmqpRetryOptions control how many times the client will retry service operations (aka operations that use the AMQP protocol), such as reading from the partition.
Once the processor delivers events, your application owns responsibility for the code that processes them - that includes error handling and retries.
The reason for this is that the EventProcessorClient does not have sufficient understanding of your application code and scenarios to determine the correct action to take in the face of an exception. For example, it has no way to know if processing is stateful and has been corrupted or is safe to retry.

Replaying Messages in Order

I am implementing a consumer which does processing of messages from a queue where order of messages is of importance. I would like to implement a mechanism using NodeJS where:
the consumer function is consuming messages m1, m2, ..., mN from the queue
doing an IO intensive operation and process the messages. m -> m'
Storing the result m' in a redis cache.
acknowledging the queue after each message process (2)
In a different function, I am listening to the message from the cache
sending the processed messages m' to an external system
if the external system was able to process the external system, then delete the processed message from the cache
If the external system rejects the processed message, then stop sending messages, discard the unsent processed messages in the cache and reset the offset to the last accepted message in the queue. For example if m12' was the last message accepted by the system, and I have acknowledged m23 from the queue, then I have to discard m13' to m23' and reset the offset so that the consumer can read and start processing from m13 again.
Few assumptions:
The processing m to m' is intensive and I am processing them optimistically, knowing that most of the times there won't be a failure
With the current assumptions and goals, is there any way I can achieve this with RabbitMQ or any Azure equivalent? My client doesn't prefer Kafka or any Azure equivalent of Kafka (Azure Event Hub).
In scenarios where the messages will always be generated in sequence then a simple queue is probably all you need.
Azure Queues are pretty simple to get into, but the general mode of operation for queues is to remove the messages as they are processed successfully.
If you can avoid the scenario where you must "roll back" or re-process from an earlier time, so if you can avoid the orchestration aspect then this would be a much simpler option.
It's the "go back and replay" that you will struggle with. If you can implement two queues in a sequential pattern, where processing messages from one queue successfully pushes the message into the next queue, then we never need to go back, because the secondary consumer can never process ahead of the primary.
With Azure Event Hubs it is much easier to reset the offset for processing, because the messages stay in the bucket regardless of their read state, (in fact any given message does not have such a state) and the consumer maintains the offset pointer itself. It also has support for multiple consumer groups, which will make a copy of the message available to each consumer.
You can up your plan to maintain the data for up to 7 days without blowing the budget.
There are two problems with Large scale telemetry ingestion services like Azure Event Hubs for your use case
The order of receipt of the message is less reliable for messages that are extremely close together, the Hub is designed to receive many messages from many sources concurrently, so its internal architecture cares a lot less about trying to preserve the precise order, it records the precise receipt timestamp on the message, but it does not guarantee that the overall sequence of records will match exactly to a scenario where you were to sort by the receipt timestamp. (its a subtle but important distinction)
Event Hubs (and many client processing code examples) are designed to guarantee Exactly Once delivery across multiple concurrent consuming threads. Again the Consumers are encouraged to be asynchronous and the serice will try to ensure that failed processing attempts are retried by the next available thread.
So you could use Event Hubs, but you would have to bypass or disable a lot of its features which is generally a strong message that it is not the correct fit for your purpose, if you want to explore it though, you would want to limit the concurrency aspects:
minimise the partition count
You probably want 1 partition for each message producer, or atleast for each sequential set, maintaining sequence is simpler inside a single partition
make sure your message sender (producer) only sends to a specific partition
Each producer MUST use a unique partition key
create a consumer group for each of your consumers
process messages one at a time, not in batches
process with a single thread
I have a lot of experience in designing MS Azure based solutions for Industrial IoT (Telemetry from PLCs) and Agricultural IoT (Raspberry Pi) device implementations. In almost all cases we think that the order of messaging is important, but unless you are maintaining real-time 2 way command and control, you can usually get away with an optimisitic approach where each message and any derivatives are or were correct at the time of transmission.
If there is the remote possibility that a device can be offline for any period of time, then dealing with the stale data flushing through the system when a device comes back online can really play havok with sequential logic programming.
Take a step back to analyse your solution, EventHubs does offer a convient way to rollback the processing to a previous offset, as long as that record is still in the bucket, but can you re-design your logic flow so that you do not have to re-process old data?
What is the requirement that drives this sequence? If it is so important to maintain the sequence, then you should probably process the data with a single consumer that does everything, or look at chaining the queues in a sequential manner.

NodeJS with Redis message queue - How to set multiple consumers (threads)

I have a nodejs project that is exposing a simple rest api for an external web application. This webhook must cope with a large number of requests per second as well as return 200 OK very quickly to the caller. In order for that to happen I investigate a redis simple queue to be enqueued with each request's to be handled asynchronously later on (via a consumer thread).
The redis simple queue seems like an easy way to achieve this task (https://github.com/smrchy/rsmq)
1) Is rsmq.receiveMessage() { ....... } a blocking method? if this handler is slow - will it impact my server's performance?
2) If the answer to question 1 is true - Is it recommended to extract the consumption of the messages to an external micro service? (a dedicated consumer)? what are the best practices to create multi threaded consumers on such environment?
You can use pubsub feature provided by redis https://redis.io/topics/pubsub
You can publish to various channels without any knowledge of subscribers . Subscribers can subscribe to the channels they wish.
sreeni
1) No, it won't block the event loop, however you will only start processing a second message once you call the "next" method, i.e., you will process one message at a time. To overcome this, you can start multiple workers in parallel. Take a look here: https://stackoverflow.com/a/45984677/7201847
2) That's an architectural decision that depends on the load you have to support and the hardware capacity you have. I would recommend at least two Node.js processes, one for adding the messages to the queue and another one to actually processing them, with the option to start additional worker processes if needed, depending on the results of your performance tests.

Resources