I have multiple Polling flows using the same flow logic, but varying the channel (it's a database column id_channel to separate rows by dependence).
On the JdbcPollingChannelAdapter, setMaxRows is fixed to 1. To my understandig each roundtrip to the database will fetch one row.
If I have 5 Polling flows and 10 threads, how each polling flow will "compete" with each other? Does setting Pollers.maxMessagesPerPoll makes any difference in the concurrency, given that JdbcPollingChannelAdapter.setMaxRows is always 1?
My application.properties (custom datasource) has:
spring.task.scheduling.pool.size=10
spring.pgsql.hikari.maximum-pool-size=10
Flow logic:
private MessageSource<Object> buildJdbcMessageSource(final int channel) {
JdbcPollingChannelAdapter adapter = new JdbcPollingChannelAdapter(dataSource, FETCH_QUERY);
adapter.setMaxRows(1);
adapter.setUpdatePerRow(true);
adapter.setSelectSqlParameterSource(new MapSqlParameterSource(Map.of("idCanal", channel)));
adapter.setRowMapper((RowMapper<IntControle>) (rs, i)
-> new IntControle(rs.getLong(1), rs.getInt(2), rs.getString(3)));
adapter.setUpdateSql(UPDATE_QUERY);
return adapter;
}
private IntegrationFlow buildIntegrationFlow(final int channel, final long rate, final int maxMessages) {
return IntegrationFlows.from(buildJdbcMessageSource(channel),
c -> c.poller(Pollers.fixedDelay(rate)
.transactional(transactionInterceptor())
.maxMessagesPerPoll(maxMessages)))
.split()
.enrichHeaders(h -> h.header(MessageHeaders.ERROR_CHANNEL, ERROR_CHANNEL))
.channel(SybaseFlowConfiguration.SYBASE_SINK)
.get();
}
public IntegrationFlow pollingFlowChannel1() {
return buildIntegrationFlow(1, properties.getChan1RateMs(), properties.getChan1MaxMessages());
}
public IntegrationFlow pollingFlowChannel2() {
return buildIntegrationFlow(2, properties.getChan2RateMs(), properties.getChan2MaxMessages());
}
...
We have some explanation in the doc: https://docs.spring.io/spring-integration/docs/current/reference/html/jdbc.html#jdbc-max-rows-versus-max-messages-per-poll.
Tell us please, if that is not enough for your expectations.
Related
I am trying to do a GroupBy a list of GeoJSON Features based on a shared ID, in order to aggregate a single field of these Features, by using split/aggregate, like so:
#Bean
IntegrationFlow myFlow() {
return IntegrationFlows.from(MY_DIRECT_CHANNEL)
.handle(Http.outboundGateway(myRestUrl)
.httpMethod(HttpMethod.GET)
.expectedResponseType(FeatureCollection.class)
.mappedResponseHeaders(""))
.split(FeatureCollection.class, FeatureCollection::getFeatures)
.aggregate(aggregator -> aggregator
.outputProcessor(agg -> {
final List<String> collected = agg
.getMessages()
.stream()
.map(m -> ((Number)((Feature) m.getPayload()).getProperties().get("my_field")).intValue() + "")
.collect(Collectors.toList());
return MyPojo.builder()
.myId(((Number) agg.getGroupId()).longValue())
.myListString(String.join(",", collected))
.build();
})
.correlationStrategy(m -> ((Feature) m.getPayload()).getProperties().get("shared_id"))
// .sendPartialResultOnExpiry(true)
// .groupTimeout(10000) // there's got to be a better way ...
// .expireGroupsUponTimeout(false)
)
.handle(Jpa.updatingGateway(myEntityManagerFactory).namedQuery(MyPojo.QUERY_UPDATE),
spec -> spec.transactional(myTransactionManager))
.nullChannel();
}
Unless I un-comment those 3 lines, the aggregator never releases the groups and the database never receives any updates. If I set groupTimeout to less than 5 seconds, I am missing partial results.
I expected the releaseStrategy to be SimpleSequenceSizeReleaseStrategy by default which I expected would automatically release all the groups after all of the (split) Features had been processed (there are only 129 Features in total from the REST service message). Manually setting this as the releaseStrategy doesn't help.
What is the proper way to release the groups once all 129 messages have been processed ?
I got it to work using a transformer instead of split/aggregate:
#Bean
IntegrationFlow myFlow(MyTransformer myTransformer) {
return IntegrationFlows.from(MY_DIRECT_CHANNEL)
.handle(Http.outboundGateway(myRestUrl)
.httpMethod(HttpMethod.GET)
.expectedResponseType(FeatureCollection.class)
.mappedResponseHeaders(""))
.transform(myTransformer)
.split()
.handle(Jpa.updatingGateway(myEntityManagerFactory).namedQuery(MyEntity.QUERY_UPDATE),
spec -> spec.transactional(myTransactionManager))
.nullChannel();
}
And the signature of the transformer is:
#Component
public class MyTransformer implements GenericTransformer<FeatureCollection, List<MyEntity>> {
#Override
public List<MyEntity> transform(FeatureCollection featureCollection) {
...
}
}
I have the following configuration:
#Bean
public IntegrationFlow messageFlow(JdbcMessageStore groupMessageStore, TransactionSynchronizationFactory syncFactory, TaskExecutor te, ThreadPoolTaskScheduler ts, RealTimeProcessor processor) {
return IntegrationFlows
.from("inputChannel")
.handle(processor, "handleInputMessage", consumer -> consumer
.taskScheduler(ts)
.poller(poller -> poller
.fixedDelay(pollerFixedDelay)
.receiveTimeout(pollerReceiveTimeout)
.maxMessagesPerPoll(pollerMaxMessagesPerPoll)
.taskExecutor(te)
.transactional()
.transactionSynchronizationFactory(syncFactory)))
.resequence(s -> s.messageStore(groupMessageStore)
.releaseStrategy(new TimeoutCountSequenceSizeReleaseStrategy(50, 30000)))
.channel("sendingChannel")
.handle(processor, "sendMessage")
.get();
}
If I send a single batch of e.g. 100 messages to the inputChannel it works as expected until there are no messages in the inputChannel. After the inputChannel becomes empty it also stops processing for messages that were waiting for sequencing. As a result there are always a couple of messages left in the groupMessageStore even after the set release timeout.
I'm guessing it's because the poller is configured only for the inputChannel and if there are no messages in there it will never get to the sequencer (so will never call canRelease on the release strategy).
But if I try adding a separate poller for the resequencer I get the following error A poller should not be specified for endpoint since channel x is a SubscribableChannel (not pollable).
Is there a different way to configure it so that the last group of messages is always released?
The release strategy is passive and needs something to trigger it to be called.
Add .groupTimeout(...) to release the partial sequence after the specified time elapses.
EDIT
#SpringBootApplication
public class So67993972Application {
private static final Logger log = LoggerFactory.getLogger(So67993972Application.class);
public static void main(String[] args) {
SpringApplication.run(So67993972Application.class, args);
}
#Bean
IntegrationFlow flow(MessageGroupStore mgs) {
return IntegrationFlows.from(MessageChannels.direct("input"))
.resequence(e -> e.messageStore(mgs)
.groupTimeout(5_000)
.sendPartialResultOnExpiry(true)
.releaseStrategy(new TimeoutCountSequenceSizeReleaseStrategy(50, 2000)))
.channel(MessageChannels.queue("output"))
.get();
}
#Bean
MessageGroupStore mgs() {
return new SimpleMessageStore();
}
#Bean
public ApplicationRunner runner(MessageChannel input, QueueChannel output, MessageGroupStore mgs) {
return args -> {
MessagingTemplate template = new MessagingTemplate(input);
log.info("Sending");
template.send(MessageBuilder.withPayload("foo")
.setHeader(IntegrationMessageHeaderAccessor.CORRELATION_ID, "bar")
.setHeader(IntegrationMessageHeaderAccessor.SEQUENCE_NUMBER, 2)
.setHeader(IntegrationMessageHeaderAccessor.SEQUENCE_SIZE, 2)
.build());
log.info(output.receive(10_000).toString());
Thread.sleep(1000);
log.info(mgs.getMessagesForGroup("bar").toString());
};
}
}
I am using spring integration framework, with a Transformer
inputChannel -> kafka consumer
outputChannel -> database jdbc writer
#Bean
public DirectChannel inboundChannel() {
return new DirectChannel();
}
#Bean
public DirectChannel outboundChannel() {
return new DirectChannel();
}
#Bean
#Transformer(inputChannel="inboundChannel", outputChannel="outboundChannel")
public JsonToObjectTransformer jsonToObjectTransformer() {
return new JsonToObjectTransformer(Item.class);
}
#Bean
#ServiceActivator(inputChannel = "outboundChannel")
public MessageHandler jdbcmessageHandler() {
JdbcMessageHandler jdbcMessageHandler = new ...
return ...;
}
#Bean
#ServiceActivator(inputChannel = "inboundChannel")
public MessageHandler kafkahandler() {
return new ...;
}
in both handlers I override
public void handleMessage(Message<?> message)
The problem: if in kafka there are total N messages,
then each handleMessage() is invoked exactly n/2 times!
I assumed that each handler will be invoked n times, because each handler linked to different channel and there are n messages in total.
What am I missing?
(if I disable the kafak handler, the second handler gets all n messages)
Update:
I need to subscriber to get all the messages from the same channel (kafka handler will do something with the raw data, and jdbc handler will push the transformed
data)
First of all your inboundChannel and outboundChannel are out of use: you nowhere (at least in the question) specify their names.
The names like input and output are taken by the framework and used to create new MessageChannel beans, which are used in other places.
Now see what you have:
#Transformer(inputChannel="input"
#ServiceActivator(inputChannel = "input")
Both of them are subscribers to the same input channel and since it is created automatically by the framework as a DirectChannel. This channel is based on a round-robin LoadBalancingStrategy, therefore you see n/2 in your Kafka since its service activator deals only with every second message sent to that input channel.
Please, see more info in docs: https://docs.spring.io/spring-integration/reference/html/core.html#channel-configuration-directchannel
I have two IntegrationFlows
both receive messages from Apache Kafka
first IntegrationFlow - in the input channel, Consumer1(concurrency=4) reads topic_1
second IntegrationFlow - in the input channel, Consumer2(concurrency=4) reads topic_2
but these two IntegrationFlows, send messages to the output channel, where one common class MyMessageHandler is specified
like this:
#Bean
public IntegrationFlow sendFromQueueFlow1(MyMessageHandler message) {
return IntegrationFlows
.from(Kafka
.messageDrivenChannelAdapter(consumerFactory1, "topic_1")
.configureListenerContainer(configureListenerContainer_priority1)
)
.handle(message)
.get();
}
#Bean
public IntegrationFlow sendFromQueueFlow2(MyMessageHandler message) {
return IntegrationFlows
.from(Kafka
.messageDrivenChannelAdapter(consumerFactory2, "topic_2")
.configureListenerContainer(configureListenerContainer_priority2)
)
.handle(message)
.get();
}
class MyMessageHandler have method send(message), this method passes messages further to another service
class MyMessageHandler {
protected void handleMessageInternal(Message<?> message)
{
String postResponse = myService.send(message); // remote service calling
msgsStatisticsService.sendMessage(message, postResponse);
// *******
}
}
inside each IntegrationFlow, 4 Consumer-threads are working (
a total of 8 threads), and they all go to one class MyMessageHandler,
into one metod send()
what problems could there be?
two IntegrationFlow, do they see each other when they pass a message to one common class??? do I need to provide thread safety in the MyMessageHandler class??? Do I need to prepend the send () method with the word synchronized???
But what if we make a third IntegrationFlow?
so that only one IntegrationFlow can pass messages through itself to the MyMessageHandler class? then would it be thread safe? example:
#Bean
public IntegrationFlow sendFromQueueFlow1() {
return IntegrationFlows
.from(Kafka
.messageDrivenChannelAdapter(consumerFactory1, "topic_1")
.configureListenerContainer(configureListenerContainer_priority1)
)
.channel(**SOME_CHANNEL**())
.get();
}
#Bean
public IntegrationFlow sendFromQueueFlow2() {
return IntegrationFlows
.from(Kafka
.messageDrivenChannelAdapter(consumerFactory2, "topic_2")
.configureListenerContainer(configureListenerContainer_priority2)
)
.channel(**SOME_CHANNEL**())
.get();
}
#Bean
public MessageChannel **SOME_CHANNEL**() {
DirectChannel channel = new DirectChannel();
return channel;
}
#Bean
public IntegrationFlow sendALLFromQueueFlow(MyMessageHandler message) {
return IntegrationFlows
.from(**SOME_CHANNEL**())
.handle(message)
.get();
}
You need to make your handler code thread-safe.
Using synchronized on the whole method you will effectively disable the concurrency.
It's better to use thread-safe techniques - no mutable fields or use limited synchronization blocks, just around critical code.
I'm building a micro service for multiple properties. So, each property has different configuration. To do that, I've implemented something like this;
#Autowired
IntegrationFlowContext flowContext;
#Bean
public void setFlowContext() {
List<Login> loginList = DAO.getLoginList(); // a web service
loginList.forEach(e -> {
IntegrationFlow flow = IntegrationFlows.from(() -> e, c -> c.poller(Pollers.fixedRate(e.getPeriod(), TimeUnit.SECONDS, 5)))
.channel("X_CHANNEL")
.get();
flowContext.registration(flow).register();
});
}
By this implementation, I'm getting the loginList before application started. So, after application is started, I'm not able to get loginList from web service since there is no poller config. The problem is loginList could change; new logins credentials could be added or deleted. Therefore, I want to implement something will work X time period to get loginList from web service, then, by loginList I need to register the flows that are created for each loginList. To achieve, I've implemented something like this;
#Bean
public IntegrationFlow setFlowContext() {
return IntegrationFlows
.from(this::getSpecification, p -> p.poller(Pollers.fixedRate(X))) // the specification is constant.
.transform(payload -> DAO.getLoginList(payload))
.split()
.<Login>handle((payload, header) -> {
IntegrationFlow flow = IntegrationFlows.from(() -> payload, c -> c.poller(Pollers.fixedRate(payload.getPeriod(), TimeUnit.SECONDS, 5)))
.channel("X_CHANNEL")
.get();
flowContext.registration(flow).register().start();
return null;
})
.get();
}
Basically, I've used start() method, but this is not working as aspected. See this;
flowContext.registration(flow).register().start();
Lastly, I've read the Dynamic and Runtime Integration Flows, but still couldn't implement this feature.
Dynamic flow registration cannot be used within a #Bean definition.
It is designed to be used at runtime AFTER the application context is fully initialized.