I am using Spring integration framework, where input channel is kafka and output is jdbc.
I want to manually commit kafka offsets, only after jdbcMessageHandler successfully processed each kafka message.
#Bean
#ServiceActivator(inputChannel = "outChannel")
public MessageHandler jdbcMessageHandler() {
JdbcMessageHandler jdbcMessageHandler = new JdbcMessageHandler(getDataSource(), getSql());
jdbcMessageHandler.setPreparedStatementSetter((ps, message) -> {
Item item = ((Item) message.getPayload());
ps.setString(1, item.getName());
Acknowledgment ack = (Acknowledgment) message.getHeaders().get(KafkaHeaders.ACKNOWLEDGMENT);
ack.acknowledge();
}
return jdbcMessageHandler;
}
#Bean
public ConsumerFactory<String, String> consumerFactory() {
...
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
...
return new DefaultKafkaConsumerFactory<>(props);
}
#Bean
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL);
...
return factory;
}
I tried , as can be seen above:
Acknowledgment ack = (Acknowledgment) message.getHeaders().get(KafkaHeaders.ACKNOWLEDGMENT);
ack.acknowledge();
But it yields unwanted effect:
each kafka message can have n - Items, therefore transformer returns a List of items , so ack.acknowledge() (which is commit) will be called for each item, n times!
And I want to call commit once only , after all items of the message are handled.
Update
After applying the recommended from answer.
I set on
ConcurrentKafkaListenerContainerFactory
.setErrorHandler(new SeekToCurrentErrorHandler(new FixedBackOff(1000L, 99L)));
I also have
#ServiceActivator(inputChannel = "errorChannel")
public void onError(ErrorMessage message) {
}
What happens: in JdbcMessageHandler error occurs,
onError is triggered once. no retries, kafka offset committed.
I need to prevent committing offsets.
Update 2
The flow:
1)
kafkainput -> PublishSubscribeChannel , attached to KafkaMessageDrivenChannelAdapter -> KafkaListenerContainerFactory -> KafkaMessageListenerContainer
also attempted to set listener.setErrorHandler(...
2)
subscribers:
#Transformer(inputChannel = "kafkainput", outputChannel = "aggregator")
#ServiceActivator(inputChannel = "kafkainput")
aggregator -> PublishSubscribeChannel
subscriber:
`#ServiceActivator(inputChannel = "aggregator")`
public FactoryBean<MessageHandler> aggregatorFactoryBean(..
AggregatorFactoryBean aggregatorFactoryBean =
aggregatorFactoryBean.setOutputChannel(outputChannel);
outputChannel -> DirectChannel
subscriber:
#ServiceActivator(inputChannel = "outputChannel")
public MessageHandler jdbcMessageHandler() {
The error occurs in jdbc.
Then only onError(..) is triggered
Update 3
Did a lot of changes, got rid of aggregator and instead using:
= new KafkaMessageDrivenChannelAdapter<>(container, KafkaMessageDrivenChannelAdapter.ListenerMode.batch)
kafkaMessageDrivenChannelAdapter.setBatchMessageConverter(new BatchMessagingMessageConverter(converter()));
kafkaMessageDrivenChannelAdapter.setErrorChannelName("error");
kafkaMessageDrivenChannelAdapter.setOutputChannelName("splitter");
set in KafkaListenerContainerFactory
factory.setErrorHandler(new SeekToCurrentErrorHandler(new FixedBackOff(500,10000)));
set in `KafkaMessageListenerContainer'
.setAckMode(ContainerProperties.AckMode.BATCH);
and I have splitter:
#Splitter(inputChannel = "splitter", outputChannel = "outputChannel")
In kafka I put corrupted message , so error occurs in splitter (I am throwing MessagingException, then `onError' is triggered once only, and kafka offsets are commited!
#ServiceActivator(inputChannel = "error")
public void onError(ErrorMessage message) {
}
why its not retrying number of times that it was configured and why does it commits offsets right away?
You need to consider to make yourself familiar with Publish-Subscribe pattern. For example a PublishSubscribeChannel can have a several subscribers to handle the same message. So, along side with your transformer to produce a list of items, you also can have a service activator which would call that ack.acknowledge() only once, when all the items in the batch are processed by JDBC channel adapter. But, of course, the input channel of your transformer must be that PublishSubscribeChannel. You also may consider to specify an order option explicitly for your subscribers to be sure that they are called in a proper order.
Another way is a RecipientListRouter.
See docs for more info:
https://docs.spring.io/spring-integration/reference/html/core.html#channel-implementations-publishsubscribechannel
https://docs.spring.io/spring-integration/reference/html/message-routing.html#router-implementations-recipientlistrouter
and of course an #Order annotation JavaDocs.
Related
this is a follow-up question to Spring Integration AWS RabbitMQ Kinesis
I have the following configuration. I am noticing that when I send a message to the input channel named kinesisSendChannel for the first time, the aggregator and release strategy is getting invoked and messages are sent to Kinesis Streams. I put debug breakpoints at different places and could verify this behavior. But when I again publish messages to the same input channel the release strategy and the outbound processor are not getting invoked and messages are not sent to the Kinesis. I am not sure why the aggregator flow is getting invoked only the first time and not for subsequent messages. For testing purpose , the TimeoutCountSequenceSizeReleaseStrategy is set with count as 1 & time as 60 seconds. There is no specific MessageStore used. Could you help identify the issue?
#Bean(name = "kinesisSendChannel")
public MessageChannel kinesisSendChannel() {
return MessageChannels.direct().get();
}
#Bean(name = "resultChannel")
public MessageChannel resultChannel() {
return MessageChannels.direct().get();
}
#Bean
#ServiceActivator(inputChannel = "kinesisSendChannel")
public MessageHandler aggregator(TestMessageProcessor messageProcessor,
MessageChannel resultChannel,
TimeoutCountSequenceSizeReleaseStrategy timeoutCountSequenceSizeReleaseStrategy) {
AggregatingMessageHandler handler = new AggregatingMessageHandler(messageProcessor);
handler.setCorrelationStrategy(new ExpressionEvaluatingCorrelationStrategy("headers['foo']"));
handler.setReleaseStrategy(timeoutCountSequenceSizeReleaseStrategy);
handler.setOutputProcessor(messageProcessor);
handler.setOutputChannel(resultChannel);
return handler;
}
#Bean
#ServiceActivator(inputChannel = "resultChannel")
public MessageHandler kinesisMessageHandler1(#Qualifier("successChannel") MessageChannel successChannel,
#Qualifier("errorChannel") MessageChannel errorChannel, final AmazonKinesisAsync amazonKinesis) {
KinesisMessageHandler kinesisMessageHandler = new KinesisMessageHandler(amazonKinesis);
kinesisMessageHandler.setSync(true);
kinesisMessageHandler.setOutputChannel(successChannel);
kinesisMessageHandler.setFailureChannel(errorChannel);
return kinesisMessageHandler;
}
public class TestMessageProcessor extends AbstractAggregatingMessageGroupProcessor {
#Override
protected Object aggregatePayloads(MessageGroup group, Map<String, Object> defaultHeaders) {
final PutRecordsRequest putRecordsRequest = new PutRecordsRequest().withStreamName("test-stream");
final List<PutRecordsRequestEntry> putRecordsRequestEntry = group.getMessages().stream()
.map(message -> (PutRecordsRequestEntry) message.getPayload()).collect(Collectors.toList());
putRecordsRequest.withRecords(putRecordsRequestEntry);
return putRecordsRequestEntry;
}
}
I believe the problem is here handler.setCorrelationStrategy(new ExpressionEvaluatingCorrelationStrategy("headers['foo']"));. All your messages come with the same foo header. So, all of them form the same message group. As long as you release group and don’t remove it, all the new messages are going to be discarded.
Please, revise aggregator documentation to make yourself familiar with all the possible behavior : https://docs.spring.io/spring-integration/docs/current/reference/html/message-routing.html#aggregator
My Message handler for publishing messages to the kinesis stream is as follows
public MessageHandler kinesisMessageHandler(final AmazonKinesisAsync amazonKinesis,
#Qualifier("successChannel") MessageChannel successChannel,
#Qualifier("errorChannel") MessageChannel errorChannel) {
KinesisMessageHandler kinesisMessageHandler = new KinesisMessageHandler(amazonKinesis);
kinesisMessageHandler.setSync(false);
kinesisMessageHandler.setOutputChannel(successChannel);
kinesisMessageHandler.setFailureChannel(errorChannel);
return kinesisMessageHandler;
}
#Bean(name = "errorChannel")
public MessageChannel errorChannel() {
return MessageChannels.direct().get();
}
#Bean(name = "successChannel")
public MessageChannel successChannel() {
return MessageChannels.direct().get();
}
The setSync flag is set as false so that the messages are getting processed asynchronously.Also, I have created separate IntegrationFlow to receive and process Kinesis response from the success & error channel.
public IntegrationFlow successMessageIntegrationFlow(MessageChannel successChannel,
MessageChannel inboundKinesisMessageChannel,
MessageReceiverServiceActivator kinesisMessageReceiverServiceActivator) {
return IntegrationFlows.from(successChannel).channel(inboundKinesisMessageChannel)
.handle(kinesisMessageReceiverServiceActivator, "receiveMessage").get();
}
#Bean
public IntegrationFlow errorMessageIntegrationFlow(MessageChannel errorChannel,
MessageChannel inboundKinesisErrorChannel,
MessageReceiverServiceActivator kinesisErrorReceiverServiceActivator
) {
return IntegrationFlows.from(errorChannel).channel(inboundKinesisErrorChannel)
.handle(kinesisErrorReceiverServiceActivator, "receiveMessage").get();
}
I wanted to know if you see any issues in using Direct Channel to receive success & error responses from Kinesis and processing it using an IntegrationFlow. As far as I know, with Direct Channel a producer is a blocker during send until the consumer finishes its work and returns management to the producer caller back. Is it a correct assumption that here the producer is executed in a different set of thread pools by the AmazonKinesisAsyncClient and the producer will not wait for the IntegrationFlow to process the messages? Let me know If I need to implement it differently
Your assumption about blocking is correct: the control does not come back to the producing thread. So, if have a limited number of threads in that Kinesis client, you need to be sure that you free them as soon as possible. You might consider to have those callbacks in the queue channel instead. They are asynchronous anyway, but won’t hold Kinesis client if that.
You still have a flaw in your flows: .channel(inboundKinesisMessageChannel) . That means the same channel in the middle if two different flows . And if it is a direct one , then you end up with round robin distribution. I would just remove it altogether .
I have the following configuration:
#Bean
public IntegrationFlow messageFlow(JdbcMessageStore groupMessageStore, TransactionSynchronizationFactory syncFactory, TaskExecutor te, ThreadPoolTaskScheduler ts, RealTimeProcessor processor) {
return IntegrationFlows
.from("inputChannel")
.handle(processor, "handleInputMessage", consumer -> consumer
.taskScheduler(ts)
.poller(poller -> poller
.fixedDelay(pollerFixedDelay)
.receiveTimeout(pollerReceiveTimeout)
.maxMessagesPerPoll(pollerMaxMessagesPerPoll)
.taskExecutor(te)
.transactional()
.transactionSynchronizationFactory(syncFactory)))
.resequence(s -> s.messageStore(groupMessageStore)
.releaseStrategy(new TimeoutCountSequenceSizeReleaseStrategy(50, 30000)))
.channel("sendingChannel")
.handle(processor, "sendMessage")
.get();
}
If I send a single batch of e.g. 100 messages to the inputChannel it works as expected until there are no messages in the inputChannel. After the inputChannel becomes empty it also stops processing for messages that were waiting for sequencing. As a result there are always a couple of messages left in the groupMessageStore even after the set release timeout.
I'm guessing it's because the poller is configured only for the inputChannel and if there are no messages in there it will never get to the sequencer (so will never call canRelease on the release strategy).
But if I try adding a separate poller for the resequencer I get the following error A poller should not be specified for endpoint since channel x is a SubscribableChannel (not pollable).
Is there a different way to configure it so that the last group of messages is always released?
The release strategy is passive and needs something to trigger it to be called.
Add .groupTimeout(...) to release the partial sequence after the specified time elapses.
EDIT
#SpringBootApplication
public class So67993972Application {
private static final Logger log = LoggerFactory.getLogger(So67993972Application.class);
public static void main(String[] args) {
SpringApplication.run(So67993972Application.class, args);
}
#Bean
IntegrationFlow flow(MessageGroupStore mgs) {
return IntegrationFlows.from(MessageChannels.direct("input"))
.resequence(e -> e.messageStore(mgs)
.groupTimeout(5_000)
.sendPartialResultOnExpiry(true)
.releaseStrategy(new TimeoutCountSequenceSizeReleaseStrategy(50, 2000)))
.channel(MessageChannels.queue("output"))
.get();
}
#Bean
MessageGroupStore mgs() {
return new SimpleMessageStore();
}
#Bean
public ApplicationRunner runner(MessageChannel input, QueueChannel output, MessageGroupStore mgs) {
return args -> {
MessagingTemplate template = new MessagingTemplate(input);
log.info("Sending");
template.send(MessageBuilder.withPayload("foo")
.setHeader(IntegrationMessageHeaderAccessor.CORRELATION_ID, "bar")
.setHeader(IntegrationMessageHeaderAccessor.SEQUENCE_NUMBER, 2)
.setHeader(IntegrationMessageHeaderAccessor.SEQUENCE_SIZE, 2)
.build());
log.info(output.receive(10_000).toString());
Thread.sleep(1000);
log.info(mgs.getMessagesForGroup("bar").toString());
};
}
}
I am using spring integration framework, with a Transformer
inputChannel -> kafka consumer
outputChannel -> database jdbc writer
#Bean
public DirectChannel inboundChannel() {
return new DirectChannel();
}
#Bean
public DirectChannel outboundChannel() {
return new DirectChannel();
}
#Bean
#Transformer(inputChannel="inboundChannel", outputChannel="outboundChannel")
public JsonToObjectTransformer jsonToObjectTransformer() {
return new JsonToObjectTransformer(Item.class);
}
#Bean
#ServiceActivator(inputChannel = "outboundChannel")
public MessageHandler jdbcmessageHandler() {
JdbcMessageHandler jdbcMessageHandler = new ...
return ...;
}
#Bean
#ServiceActivator(inputChannel = "inboundChannel")
public MessageHandler kafkahandler() {
return new ...;
}
in both handlers I override
public void handleMessage(Message<?> message)
The problem: if in kafka there are total N messages,
then each handleMessage() is invoked exactly n/2 times!
I assumed that each handler will be invoked n times, because each handler linked to different channel and there are n messages in total.
What am I missing?
(if I disable the kafak handler, the second handler gets all n messages)
Update:
I need to subscriber to get all the messages from the same channel (kafka handler will do something with the raw data, and jdbc handler will push the transformed
data)
First of all your inboundChannel and outboundChannel are out of use: you nowhere (at least in the question) specify their names.
The names like input and output are taken by the framework and used to create new MessageChannel beans, which are used in other places.
Now see what you have:
#Transformer(inputChannel="input"
#ServiceActivator(inputChannel = "input")
Both of them are subscribers to the same input channel and since it is created automatically by the framework as a DirectChannel. This channel is based on a round-robin LoadBalancingStrategy, therefore you see n/2 in your Kafka since its service activator deals only with every second message sent to that input channel.
Please, see more info in docs: https://docs.spring.io/spring-integration/reference/html/core.html#channel-configuration-directchannel
I have the following:
[inbound channel adapter] -> ... -> foo -> [outbound channel adapter] -> bar
How can I write my spring-integration app so that foo can an extra object that's not part of the message the [outbound channel adapter] is to consume, such that bar gets it?
My app basically receives messages from AWS SQS (using spring-integration-aws), does some filtering / transformations, then publishes a message to Apache Kafka (using spring-integration-kafka), and if and only if that succeeds, deletes the original message off the SQS queue.
For that reason, when I receive the SQS message, I want to hold onto the receipt handle / acknowledgement object, transform the rest of the message into the Kafka message to be published, and then if that succeeds, make use of that receipt handle / acknowledgement object to dequeue the original message.
So say I'm using this example code off the spring-integration-kafka docs:
#Bean
#ServiceActivator(inputChannel = "toKafka", outputChannel = "result")
public MessageHandler handler() throws Exception {
KafkaProducerMessageHandler<String, String> handler =
new KafkaProducerMessageHandler<>(kafkaTemplate());
handler.setTopicExpression(new LiteralExpression("someTopic"));
handler.setMessageKeyExpression(new LiteralExpression("someKey"));
handler.setFailureChannel(failures());
return handler;
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
#Bean
public ProducerFactory<String, String> producerFactory() {
Map<String, Object> props = new HashMap<>();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, this.brokerAddress);
// set more properties
return new DefaultKafkaProducerFactory<>(props);
}
With the above, if I have a message message and some extra, unrelated info extra, what do I send to the toKafka channel such that handler will consume message, and if that was successful, the result channel will receive extra?
Outbound channel adapters don't produce output - they are one-way only and end the flow.
You can make toKafka a PublishSubscribeChannel and add a second service activator; by default, the second will only be called if the first is successful.