Spring Integration - 2 MessageHandlers handle half of messages - spring-integration

I am using spring integration framework, with a Transformer
inputChannel -> kafka consumer
outputChannel -> database jdbc writer
#Bean
public DirectChannel inboundChannel() {
return new DirectChannel();
}
#Bean
public DirectChannel outboundChannel() {
return new DirectChannel();
}
#Bean
#Transformer(inputChannel="inboundChannel", outputChannel="outboundChannel")
public JsonToObjectTransformer jsonToObjectTransformer() {
return new JsonToObjectTransformer(Item.class);
}
#Bean
#ServiceActivator(inputChannel = "outboundChannel")
public MessageHandler jdbcmessageHandler() {
JdbcMessageHandler jdbcMessageHandler = new ...
return ...;
}
#Bean
#ServiceActivator(inputChannel = "inboundChannel")
public MessageHandler kafkahandler() {
return new ...;
}
in both handlers I override
public void handleMessage(Message<?> message)
The problem: if in kafka there are total N messages,
then each handleMessage() is invoked exactly n/2 times!
I assumed that each handler will be invoked n times, because each handler linked to different channel and there are n messages in total.
What am I missing?
(if I disable the kafak handler, the second handler gets all n messages)
Update:
I need to subscriber to get all the messages from the same channel (kafka handler will do something with the raw data, and jdbc handler will push the transformed
data)

First of all your inboundChannel and outboundChannel are out of use: you nowhere (at least in the question) specify their names.
The names like input and output are taken by the framework and used to create new MessageChannel beans, which are used in other places.
Now see what you have:
#Transformer(inputChannel="input"
#ServiceActivator(inputChannel = "input")
Both of them are subscribers to the same input channel and since it is created automatically by the framework as a DirectChannel. This channel is based on a round-robin LoadBalancingStrategy, therefore you see n/2 in your Kafka since its service activator deals only with every second message sent to that input channel.
Please, see more info in docs: https://docs.spring.io/spring-integration/reference/html/core.html#channel-configuration-directchannel

Related

Spring Integration aws Kinesis , message aggregator, Release Strategy

this is a follow-up question to Spring Integration AWS RabbitMQ Kinesis
I have the following configuration. I am noticing that when I send a message to the input channel named kinesisSendChannel for the first time, the aggregator and release strategy is getting invoked and messages are sent to Kinesis Streams. I put debug breakpoints at different places and could verify this behavior. But when I again publish messages to the same input channel the release strategy and the outbound processor are not getting invoked and messages are not sent to the Kinesis. I am not sure why the aggregator flow is getting invoked only the first time and not for subsequent messages. For testing purpose , the TimeoutCountSequenceSizeReleaseStrategy is set with count as 1 & time as 60 seconds. There is no specific MessageStore used. Could you help identify the issue?
#Bean(name = "kinesisSendChannel")
public MessageChannel kinesisSendChannel() {
return MessageChannels.direct().get();
}
#Bean(name = "resultChannel")
public MessageChannel resultChannel() {
return MessageChannels.direct().get();
}
#Bean
#ServiceActivator(inputChannel = "kinesisSendChannel")
public MessageHandler aggregator(TestMessageProcessor messageProcessor,
MessageChannel resultChannel,
TimeoutCountSequenceSizeReleaseStrategy timeoutCountSequenceSizeReleaseStrategy) {
AggregatingMessageHandler handler = new AggregatingMessageHandler(messageProcessor);
handler.setCorrelationStrategy(new ExpressionEvaluatingCorrelationStrategy("headers['foo']"));
handler.setReleaseStrategy(timeoutCountSequenceSizeReleaseStrategy);
handler.setOutputProcessor(messageProcessor);
handler.setOutputChannel(resultChannel);
return handler;
}
#Bean
#ServiceActivator(inputChannel = "resultChannel")
public MessageHandler kinesisMessageHandler1(#Qualifier("successChannel") MessageChannel successChannel,
#Qualifier("errorChannel") MessageChannel errorChannel, final AmazonKinesisAsync amazonKinesis) {
KinesisMessageHandler kinesisMessageHandler = new KinesisMessageHandler(amazonKinesis);
kinesisMessageHandler.setSync(true);
kinesisMessageHandler.setOutputChannel(successChannel);
kinesisMessageHandler.setFailureChannel(errorChannel);
return kinesisMessageHandler;
}
public class TestMessageProcessor extends AbstractAggregatingMessageGroupProcessor {
#Override
protected Object aggregatePayloads(MessageGroup group, Map<String, Object> defaultHeaders) {
final PutRecordsRequest putRecordsRequest = new PutRecordsRequest().withStreamName("test-stream");
final List<PutRecordsRequestEntry> putRecordsRequestEntry = group.getMessages().stream()
.map(message -> (PutRecordsRequestEntry) message.getPayload()).collect(Collectors.toList());
putRecordsRequest.withRecords(putRecordsRequestEntry);
return putRecordsRequestEntry;
}
}
I believe the problem is here handler.setCorrelationStrategy(new ExpressionEvaluatingCorrelationStrategy("headers['foo']"));. All your messages come with the same foo header. So, all of them form the same message group. As long as you release group and don’t remove it, all the new messages are going to be discarded.
Please, revise aggregator documentation to make yourself familiar with all the possible behavior : https://docs.spring.io/spring-integration/docs/current/reference/html/message-routing.html#aggregator

Spring Intgergation aws - KinesisMessageHandler Direct Channel

My Message handler for publishing messages to the kinesis stream is as follows
public MessageHandler kinesisMessageHandler(final AmazonKinesisAsync amazonKinesis,
#Qualifier("successChannel") MessageChannel successChannel,
#Qualifier("errorChannel") MessageChannel errorChannel) {
KinesisMessageHandler kinesisMessageHandler = new KinesisMessageHandler(amazonKinesis);
kinesisMessageHandler.setSync(false);
kinesisMessageHandler.setOutputChannel(successChannel);
kinesisMessageHandler.setFailureChannel(errorChannel);
return kinesisMessageHandler;
}
#Bean(name = "errorChannel")
public MessageChannel errorChannel() {
return MessageChannels.direct().get();
}
#Bean(name = "successChannel")
public MessageChannel successChannel() {
return MessageChannels.direct().get();
}
The setSync flag is set as false so that the messages are getting processed asynchronously.Also, I have created separate IntegrationFlow to receive and process Kinesis response from the success & error channel.
public IntegrationFlow successMessageIntegrationFlow(MessageChannel successChannel,
MessageChannel inboundKinesisMessageChannel,
MessageReceiverServiceActivator kinesisMessageReceiverServiceActivator) {
return IntegrationFlows.from(successChannel).channel(inboundKinesisMessageChannel)
.handle(kinesisMessageReceiverServiceActivator, "receiveMessage").get();
}
#Bean
public IntegrationFlow errorMessageIntegrationFlow(MessageChannel errorChannel,
MessageChannel inboundKinesisErrorChannel,
MessageReceiverServiceActivator kinesisErrorReceiverServiceActivator
) {
return IntegrationFlows.from(errorChannel).channel(inboundKinesisErrorChannel)
.handle(kinesisErrorReceiverServiceActivator, "receiveMessage").get();
}
I wanted to know if you see any issues in using Direct Channel to receive success & error responses from Kinesis and processing it using an IntegrationFlow. As far as I know, with Direct Channel a producer is a blocker during send until the consumer finishes its work and returns management to the producer caller back. Is it a correct assumption that here the producer is executed in a different set of thread pools by the AmazonKinesisAsyncClient and the producer will not wait for the IntegrationFlow to process the messages? Let me know If I need to implement it differently
Your assumption about blocking is correct: the control does not come back to the producing thread. So, if have a limited number of threads in that Kinesis client, you need to be sure that you free them as soon as possible. You might consider to have those callbacks in the queue channel instead. They are asynchronous anyway, but won’t hold Kinesis client if that.
You still have a flaw in your flows: .channel(inboundKinesisMessageChannel) . That means the same channel in the middle if two different flows . And if it is a direct one , then you end up with round robin distribution. I would just remove it altogether .

Spring Integration resequencer does not release the last group of messages

I have the following configuration:
#Bean
public IntegrationFlow messageFlow(JdbcMessageStore groupMessageStore, TransactionSynchronizationFactory syncFactory, TaskExecutor te, ThreadPoolTaskScheduler ts, RealTimeProcessor processor) {
return IntegrationFlows
.from("inputChannel")
.handle(processor, "handleInputMessage", consumer -> consumer
.taskScheduler(ts)
.poller(poller -> poller
.fixedDelay(pollerFixedDelay)
.receiveTimeout(pollerReceiveTimeout)
.maxMessagesPerPoll(pollerMaxMessagesPerPoll)
.taskExecutor(te)
.transactional()
.transactionSynchronizationFactory(syncFactory)))
.resequence(s -> s.messageStore(groupMessageStore)
.releaseStrategy(new TimeoutCountSequenceSizeReleaseStrategy(50, 30000)))
.channel("sendingChannel")
.handle(processor, "sendMessage")
.get();
}
If I send a single batch of e.g. 100 messages to the inputChannel it works as expected until there are no messages in the inputChannel. After the inputChannel becomes empty it also stops processing for messages that were waiting for sequencing. As a result there are always a couple of messages left in the groupMessageStore even after the set release timeout.
I'm guessing it's because the poller is configured only for the inputChannel and if there are no messages in there it will never get to the sequencer (so will never call canRelease on the release strategy).
But if I try adding a separate poller for the resequencer I get the following error A poller should not be specified for endpoint since channel x is a SubscribableChannel (not pollable).
Is there a different way to configure it so that the last group of messages is always released?
The release strategy is passive and needs something to trigger it to be called.
Add .groupTimeout(...) to release the partial sequence after the specified time elapses.
EDIT
#SpringBootApplication
public class So67993972Application {
private static final Logger log = LoggerFactory.getLogger(So67993972Application.class);
public static void main(String[] args) {
SpringApplication.run(So67993972Application.class, args);
}
#Bean
IntegrationFlow flow(MessageGroupStore mgs) {
return IntegrationFlows.from(MessageChannels.direct("input"))
.resequence(e -> e.messageStore(mgs)
.groupTimeout(5_000)
.sendPartialResultOnExpiry(true)
.releaseStrategy(new TimeoutCountSequenceSizeReleaseStrategy(50, 2000)))
.channel(MessageChannels.queue("output"))
.get();
}
#Bean
MessageGroupStore mgs() {
return new SimpleMessageStore();
}
#Bean
public ApplicationRunner runner(MessageChannel input, QueueChannel output, MessageGroupStore mgs) {
return args -> {
MessagingTemplate template = new MessagingTemplate(input);
log.info("Sending");
template.send(MessageBuilder.withPayload("foo")
.setHeader(IntegrationMessageHeaderAccessor.CORRELATION_ID, "bar")
.setHeader(IntegrationMessageHeaderAccessor.SEQUENCE_NUMBER, 2)
.setHeader(IntegrationMessageHeaderAccessor.SEQUENCE_SIZE, 2)
.build());
log.info(output.receive(10_000).toString());
Thread.sleep(1000);
log.info(mgs.getMessagesForGroup("bar").toString());
};
}
}

Correct way to call commit offsets

I am using Spring integration framework, where input channel is kafka and output is jdbc.
I want to manually commit kafka offsets, only after jdbcMessageHandler successfully processed each kafka message.
#Bean
#ServiceActivator(inputChannel = "outChannel")
public MessageHandler jdbcMessageHandler() {
JdbcMessageHandler jdbcMessageHandler = new JdbcMessageHandler(getDataSource(), getSql());
jdbcMessageHandler.setPreparedStatementSetter((ps, message) -> {
Item item = ((Item) message.getPayload());
ps.setString(1, item.getName());
Acknowledgment ack = (Acknowledgment) message.getHeaders().get(KafkaHeaders.ACKNOWLEDGMENT);
ack.acknowledge();
}
return jdbcMessageHandler;
}
#Bean
public ConsumerFactory<String, String> consumerFactory() {
...
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
...
return new DefaultKafkaConsumerFactory<>(props);
}
#Bean
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL);
...
return factory;
}
I tried , as can be seen above:
Acknowledgment ack = (Acknowledgment) message.getHeaders().get(KafkaHeaders.ACKNOWLEDGMENT);
ack.acknowledge();
But it yields unwanted effect:
each kafka message can have n - Items, therefore transformer returns a List of items , so ack.acknowledge() (which is commit) will be called for each item, n times!
And I want to call commit once only , after all items of the message are handled.
Update
After applying the recommended from answer.
I set on
ConcurrentKafkaListenerContainerFactory
.setErrorHandler(new SeekToCurrentErrorHandler(new FixedBackOff(1000L, 99L)));
I also have
#ServiceActivator(inputChannel = "errorChannel")
public void onError(ErrorMessage message) {
}
What happens: in JdbcMessageHandler error occurs,
onError is triggered once. no retries, kafka offset committed.
I need to prevent committing offsets.
Update 2
The flow:
1)
kafkainput -> PublishSubscribeChannel , attached to KafkaMessageDrivenChannelAdapter -> KafkaListenerContainerFactory -> KafkaMessageListenerContainer
also attempted to set listener.setErrorHandler(...
2)
subscribers:
#Transformer(inputChannel = "kafkainput", outputChannel = "aggregator")
#ServiceActivator(inputChannel = "kafkainput")
aggregator -> PublishSubscribeChannel
subscriber:
`#ServiceActivator(inputChannel = "aggregator")`
public FactoryBean<MessageHandler> aggregatorFactoryBean(..
AggregatorFactoryBean aggregatorFactoryBean =
aggregatorFactoryBean.setOutputChannel(outputChannel);
outputChannel -> DirectChannel
subscriber:
#ServiceActivator(inputChannel = "outputChannel")
public MessageHandler jdbcMessageHandler() {
The error occurs in jdbc.
Then only onError(..) is triggered
Update 3
Did a lot of changes, got rid of aggregator and instead using:
= new KafkaMessageDrivenChannelAdapter<>(container, KafkaMessageDrivenChannelAdapter.ListenerMode.batch)
kafkaMessageDrivenChannelAdapter.setBatchMessageConverter(new BatchMessagingMessageConverter(converter()));
kafkaMessageDrivenChannelAdapter.setErrorChannelName("error");
kafkaMessageDrivenChannelAdapter.setOutputChannelName("splitter");
set in KafkaListenerContainerFactory
factory.setErrorHandler(new SeekToCurrentErrorHandler(new FixedBackOff(500,10000)));
set in `KafkaMessageListenerContainer'
.setAckMode(ContainerProperties.AckMode.BATCH);
and I have splitter:
#Splitter(inputChannel = "splitter", outputChannel = "outputChannel")
In kafka I put corrupted message , so error occurs in splitter (I am throwing MessagingException, then `onError' is triggered once only, and kafka offsets are commited!
#ServiceActivator(inputChannel = "error")
public void onError(ErrorMessage message) {
}
why its not retrying number of times that it was configured and why does it commits offsets right away?
You need to consider to make yourself familiar with Publish-Subscribe pattern. For example a PublishSubscribeChannel can have a several subscribers to handle the same message. So, along side with your transformer to produce a list of items, you also can have a service activator which would call that ack.acknowledge() only once, when all the items in the batch are processed by JDBC channel adapter. But, of course, the input channel of your transformer must be that PublishSubscribeChannel. You also may consider to specify an order option explicitly for your subscribers to be sure that they are called in a proper order.
Another way is a RecipientListRouter.
See docs for more info:
https://docs.spring.io/spring-integration/reference/html/core.html#channel-implementations-publishsubscribechannel
https://docs.spring.io/spring-integration/reference/html/message-routing.html#router-implementations-recipientlistrouter
and of course an #Order annotation JavaDocs.

Thread safety in executor channel

I have a message producer which produces around 15 messages/second
The consumer is a spring integration project which consumes from the Message Queue and does a lot of processing. I have used the Executor channel to process messages in parallel and then the flow passes through some common handler class.
Please find below the snippet of code -
baseEventFlow() - We receive the message from the EMS queue and send it to a router
router() - Based on the id of the message" a particular ExecutorChannel instance is configured with a singled-threaded Executor. Every ExecutorChannel is going to be its dedicated executor with only single thread.
skwDefaultChannel(), gjsucaDefaultChannel(), rpaDefaultChannel() - All the ExecutorChannel beans are marked with the #BridgeTo for the same channel which starts that common flow.
uaEventFlow() - Here each message will get processed
#Bean
public IntegrationFlow baseEventFlow() {
return IntegrationFlows
.from(Jms.messageDrivenChannelAdapter(Jms.container(this.emsConnectionFactory, this.emsQueue).get()))
.wireTap(FLTAWARE_WIRE_TAP_CHNL)
.route(router()).get();
}
public AbstractMessageRouter router() {
return new AbstractMessageRouter() {
#Override
protected Collection<MessageChannel> determineTargetChannels(Message<?> message) {
if (message.getPayload().toString().contains("\"id\":\"RPA")) {
return Collections.singletonList(skwDefaultChannel());
}else if (message.getPayload().toString().contains("\"id\":\"ASH")) {
return Collections.singletonList(rpaDefaultChannel());
} else if (message.getPayload().toString().contains("\"id\":\"GJS")
|| message.getPayload().toString().contains("\"id\":\"UCA")) {
return Collections.singletonList(gjsucaDefaultChannel());
} else {
return Collections.singletonList(new NullChannel());
}
}
};
}
#Bean
#BridgeTo("uaDefaultChannel")
public MessageChannel skwDefaultChannel() {
return MessageChannels.executor(SKW_DEFAULT_CHANNEL_NAME, Executors.newFixedThreadPool(1)).get();
}
#Bean
#BridgeTo("uaDefaultChannel")
public MessageChannel gjsucaDefaultChannel() {
return MessageChannels.executor(GJS_UCA_DEFAULT_CHANNEL_NAME, Executors.newFixedThreadPool(1)).get();
}
#Bean
#BridgeTo("uaDefaultChannel")
public MessageChannel rpaDefaultChannel() {
return MessageChannels.executor(RPA_DEFAULT_CHANNEL_NAME, Executors.newFixedThreadPool(1)).get();
}
#Bean
public IntegrationFlow uaEventFlow() {
return IntegrationFlows.from("uaDefaultChannel")
.wireTap(UA_WIRE_TAP_CHNL)
.transform(eventHandler, "parseEvent")
.handle(uaImpl, "process").get();
}
My concern is in the uaEVentFlow() the common transform and handler method are not thread safe and it may cause issue. How can we ensure that we inject a new transformer and handler at every message invocation?
Should I change the scope of the transformer and handler bean as prototype?
Instead of bridging to a common flow, you should move the .transform() and .handle() to each of the upstream flows and add
#Scope(ConfigurableBeanFactory.SCOPE_PROTOTYPE)
to their #Bean definitions so each gets its own instance.
But, it's generally better to make your code thread-safe.

Resources