Consuming batch of message from pubsub with Spring

Consuming batch of message from pubsub with Spring - spring-integration

How to consume multiple messages from pubsub? This seems like a simple problem that should have simple solution but currently I can find easy way to consume batches of records from pubsub with spring-cloud-gcp-pubsub.
I'm using spring-cloud-gcp-pubsub to consume messages from pubsub and process them in spring boot application. My current setup is very simple I have PubSubInboundChannelAdapter and ServiceActivator that consumes records. After research I have found spring integration Aggregators but they didn't seem like a good way of doing this because it's not easy to propagate the acknowledgment downstream. Is there anything I'm missing? How can I consume batches of messages?

The PubSubInboundChannelAdapter is based on the subscription to the topic. So, it is going to be a stream of messages and this PubSubInboundChannelAdapter reacts to each of them converting to the Spring Message and sending it downstream to the configured channel.
There is really no way to get a batch of messages during subscription.
Also you need to keep in mind that there is no something like offset in GCP Pub/Sub. You indeed should acknowledge every single message you consume from the Pub/Sub.
Although there is the way to pull a batch of messages at once, using PubSubMessageSource. The messageSource.setMaxFetchSize(5); does the trick, but this PubSubMessageSource still produces every message individually, so you would be able to (n)ack them independently.
You can, of course, leverage the feature PubSubMessageSource uses - PubSubSubscriberOperations.pullAndConvert(). See it's JavaDocs for more info:
/**
* Pull a number of messages from a Google Cloud Pub/Sub subscription and convert them to Spring messages with
* the desired payload type.
* #param subscription the subscription name
* #param maxMessages the maximum number of pulled messages
* #param returnImmediately returns immediately even if subscription doesn't contain enough
* messages to satisfy {#code maxMessages}
* #param payloadType the type to which the payload of the Pub/Sub messages should be converted
* #param <T> the type of the payload
* #return the list of received acknowledgeable messages
* #since 1.1
*/
<T> List<ConvertedAcknowledgeablePubsubMessage<T>> pullAndConvert(String subscription, Integer maxMessages,
Boolean returnImmediately, Class<T> payloadType);
So, this one looks like what you are looking for because you indeed are going to have a list of messages and each of them is a wrapper with (n)ack callbacks.
This API could be used in the custom #InboundChannelAdapter MessageSource or Supplier #Bean implementation.
But still: I don't see benefits of the whole batch processing since every message can be ack'ed individually not affecting all others.

Try using below:
#Bean
#InboundChannelAdapter(channel = "pubsubInputChannel", poller = #Poller(fixedDelay = "5000", maxMessagesPerPoll = "3"))
public MessageSource<Object> pubsubAdapter(PubSubTemplate pubSubTemplate) {
PubSubMessageSource messageSource = new PubSubMessageSource(pubSubTemplate, "testSubscription");
messageSource.setAckMode(AckMode.MANUAL);
return messageSource;
}
maxMessagesPerPoll property determines how many messages will be polled.

Related

Spring Integration JdbcOperations.queryForStream() - split and aggregate

We are using JdbcOperations.queryForStream() to fetch 30k+ rows from database as per the inputs from Spring Integration Jdbc OutboundGateway returning 1 record ONLY even with MaxRows(0), however split() and aggregate() on the stream is not working. We need the aggregation to work as to know when all the stream records are consumed to perform a final operation.

The splitter doesn't know the size for an Iterator, Stream or Flux request message payloads: https://docs.spring.io/spring-integration/docs/current/reference/html/message-routing.html#iterators. Therefore a sequenceSize header is 0 and default aggregator cannot do its job just because there is no sequenceSize to compare with. You must provide a custom releaseStrategy or rely on a groupTimeout to perform that final operation.
Another trick could be done with JDBC: before calling queryForStream() you can ask for count of records and set that value into some header in a reply message before splitter. Such a header you can use in a custom releaseStrategy.
See more info about aggregator features in docs: https://docs.spring.io/spring-integration/docs/current/reference/html/message-routing.html#aggregator

Issue with Spring Integration Aggregator group-timeout value

We are using Spring Integration 4.2.3 Aggregator component and defined group-timeout and expecting the group to be timed out within the given timeout value while adding messages to the group & release size criteria is not met.
But we are seeing different results, when we input heavy load to the service the aggregator is waiting on all messages to be added to the group rather than expiring the group when the timeout reached.
Is there any way to override the aggregator functionality to look at the first message rather than last message when timing out group.

Well, actually you can do what you need even now. Using the same group-timeout-expression. But you have to consult the #root object of the evaluation context which is exactly what you need - MessageGroup. With that you can call one of for your purpose:
/**
* #return the timestamp (milliseconds since epoch) associated with the creation of this group
*/
long getTimestamp();
/**
* #return the timestamp (milliseconds since epoch) associated with the time this group was last updated
*/
long getLastModified();
Therefore an expression for your original request might be like:
group-timeout-expression="timestamp + 10000 - T(System).currentTimeMillis()"
And we get that adjusted timeout which will be applied to scheduled task with the value like: new Date(System.currentTimeMillis() + groupTimeout));.

No; the timeout is currently based on the arrival of the last message only.
If you use a MessageGroupStoreReaper instead, the time is based on the group creation by default, but that can be changed by setting the group store's timeoutOnIdle to true.
If your group is not timing out at all, perhaps the thread pool in the default taskScheduler is exhausted - it only has 10 threads by default.
You can increase the pool size or inject a dedicated scheduler into the aggregator.

we debugged the issue with your group-timeout-expression(timestamp + 20000 - T(System).currentTimeMillis()) and found out that the expression is evaluating to a negative value after messaged keep flowing in thus causing the group never getting released.
The code block where the issue is in AbstractCorrelatingMessageHandler.java
Once we removed the condition of "groupTimeout >= 0", now the group is getting expired because of the else block. The code is now behaving like how we expected.
Could you let me know why you are not forcing the group to be timedoout when it reaches negative value?

Release a group when the number of messages in the group gets to a number defined in an another line message

I have a batch process , and we receive an START message in a queue, and an END message in the same queue. After the Start message, we receive thousands of messages in other 3 queues, that we filter, enrich, aggregate and finally transform to JSON. We can call this pipeline as MAIN_PIPE )
After that Start message we have an adapter that reads from database the total number of elements in only one message that we will receive ( we can call this pipeline as COUNTER_PIPE )
And after the End message, whenever we have treated ALL the messages we have to send a request to an external service.
So, we need to count all treated messages ( JSON converted ) in MAIN_PIPE and compare to that number in COUNTER_PIPE.
How can I compare that ?

Would you mind to describe also how do you read from those 3 queues? It isn't clear to me where is a correlation between START and all those messages to the batch. If that is regular message-driven channel adapter, there is a case when we may start receiving those message but there is still no START or no info about count in the DB.
Anyway I'd make it like:
The START and END messages, as well, as all messages in that batch must have the same correlataionKey to let an Aggregator to form a batch in the end.
Since the group in case is based on the count anyway, you don't have choice unless send to the aggregator even discarded messages from the filter. That might be just simple error stub to be able to distinguish them properly in the aggregator's release function.
The releaseStrategy of the aggregator must iterate over the group to find a message with the count and compare it with the group size + 2 (START & END messages).
Does it make sense to you?

Spring Kafka Listening to all topics and adjusting partition offsets

Based on the documentation at spring-kafka, I am using Annotation based #KafkaListener to configure my consumer.
What I see is that -
Unless I specify the offset to zero, up on start, Kafka consumer picks up the future messages and not the existing ones. (I understand this is an expected result because I am not specifying the offset to what I want)
I see an option in the documentation to specify a topic + partition combination and along with that an offset of zero, but if I do this - I have to explicitly specify which topic I want my consumer to listen to.
Using approach 2 above, this is how my consumer looks now -
#KafkaListener(id = "{group.id}",
topicPartitions = {
#TopicPartition(topic = "${kafka.topic.name}",
partitionOffsets = #PartitionOffset(partition = "0", initialOffset = "0"))
},
containerFactory = "kafkaListenerContainerFactory")
public void listen(#Payload String payload,
Acknowledgment ack) throws InterruptedException, IOException {
logger.debug("This is what we received in the Kafka Consumer = " + payload);
idService.process(payload);
ack.acknowledge();
}
While I understand that there is an option to specify the "topicPattern" wild card or a "topics" list as a part of the annotation configuration, I don't see a place where I can provide the offset value to start from zero for the topics / topic patterns listed. Is there a way to do a combination of both? Please advise.

When using topics and topicPatterns (rather than explicitly declaring the partitions), Kafka decides which consumer instance will get which partitions.
Kafka will allocate the partitions and the initial offset will be the last committed for that group id. You cannot currently change that offset but we are considering adding a seek function.
If you always want to start at the first available offset, use a unique group id (e.g. UUID.randomUUID().toString()) and set
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
Since Kafka will have no existing offset for that group id it will use that property to determine where to start.
You can also use MANUAL ack mode and never ack, which will effectively do the same thing.

Azure Event hubs changing the minimum of the reciever

I'm using the EventProcessorHost to get messages off an event hub. Is there an easy way to change the maximum number of messages that are pulled off at a time. Right now the default is 10 and I know when using a normal EventReciever it is relatively easy to change the default, but I couldn't find any documentation for when using EventProcessor.
I want so when ProcessEventsAsync is called the maximum number of messages passed in to be less than 10.

You can do it by providing EventProcessorOptions when registering EventProcessor with MaxBatchSize property modified (https://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.eventprocessoroptions.maxbatchsize.aspx).
For example:
var eventProcessorHost = new EventProcessorHost(...);
await eventProcessorHost.RegisterEventProcessorAsync<MyEventProcessor>(new EventProcessorOptions{MaxBatchSize = 5});

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string