Custom partition in pulsar

Custom partition in pulsar - apache-pulsar

Unable to send message from producer to pulsar, when Producer is set to customPartition (please refer below code).
Producer<byte[]> producer = client.newProducer()
.topic(pulsarTopic)
//.messageRoutingMode(MessageRoutingMode.RoundRobinPartition)
.messageRoutingMode(MessageRoutingMode.CustomPartition)
.messageRouter( new MessageRounterImpl())
.create();
Code to send Message :
producer.send(msg);
MessageRouterImpl has randomly generates number with range from 0 to 5, as below code
public class MessageRounterImpl implements MessageRouter {
#Override
public int choosePartition(Message<?> msg, TopicMetadata metadata) {
Random r = new Random();
return r.nextInt((0 - 5) + 1);
}
}
My question is why i am unable to send message from producer with CustomPartition and why i am getting below log messages
Batching the messages from the batch container from timer thread
Batching the messages from the batch container with 0 messages
With MessageRoutingMode.RoundRobinPartition and MessageRoutingMode.SinglePartition I was able to send message from producer.
It would be really helpful if someone through light on this .

First, please take into account that partitioned topics should be explicitly created before a publisher starts sending messages, for example:
bin/pulsar-admin topics create-partitioned-topic persistent://tenant/namespace/partitioned-topic-name --partitions 5
Second, the following line will throw an exception (it cannot handle negative values as input):
return r.nextInt((0 - 5) + 1);
It is possible to use the following one:
return r.nextInt(5);

Related

Sending data in batches using spring integration?

I am trying to read a file from GCP based on a notification received as per the flow defined below:
File reader - Deserialises the data into collection and sends for routing.
I am de-searializing the data in collection of objects and sending it router for further processing. As i don't have the control over file size, i am thinking of some approach of batching the reader process.
Currently, the file-reader service activator returns the whole Collection of deserialised objects.
Issue:
In case i receive a file of larger size i.e. with 200k records, i want to send this in batches to the header value router rather than a
collection of 200k objects.
If i convert the file-reader into a splitter and add an aggregator
after that Notification -> file-reader -> aggregator -> router.
I would still need to return the collection of all the objects not the iterator.
I don't want to load all the record into a collection.
Updated approach:
public <S> Collection<S> readData(DataInfo dataInfo, Class<S> clazz) {
Resource gcpResource = context.getResource("classpath://data.json")
var tempDataSet = new HashSet<S>();
AtomicInteger pivot = new AtomicInteger();
try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gcpResource.getInputStream()))) {
bufferedReader.lines().map((dataStr) -> {
try {
var data = deserializeData(dataStr, clazz);
return data;
} catch (JsonProcessingException ex) {
throw new CustomException("PARSER-1001", "Error occurred while parsing", ex);
}
}).forEach(data -> {
if (BATCH_SIZE == pivot.get()) {
//When the size in tempDataSet reached BATCH_SIZE send the data in routing channel and reset the pivot
var message = MessageBuilder.withPayload(tempDataSet.clone())
.setHeader(AppConstants.EVENT_HEADER_KEY, eventType)
.build();
routingChannel.send(message);
pivot.set(0);
tempDataSet.removeAll(tempDataSet);
} else {
pivot.addAndGet(1);
tempDataSet.add(data);
}
});
return tempDataSet;
} catch (Exception ex) {
throw new CustomException("PARSER-1002", "Error occurred while parsing", ex);
}
}
If the batch size in 100 and we received 1010 objects. The 11 batches would be created, 10 with 100 and last one with 10 objects in it.
In case i use a splitter and pass the stream to it, will it wait for the whole stream to finish and then send the collected collection or we can achieve something close to previous approach using it?

Not sure what is the question, but I would go with FileSplitter + Aggregator solution. The first one is exactly for streaming file reading use-case. The second one lets you to buffer incoming messages until they reach some condition, so it can emit a single message downstream. That message indeed could be with a collection as a payload.
Here is their docs for your consideration:
https://docs.spring.io/spring-integration/docs/current/reference/html/message-routing.html#aggregator
https://docs.spring.io/spring-integration/docs/current/reference/html/file.html#file-splitter

Run Spring Integration flow concurrently for each Ftp file

I have a Integration flow configured using Java DSL which pulls file from Ftp server using Ftp.inboundChannelAdapter then transforms it to JobRequest, then I have a .handle() method which triggers my batch job, everything is working as per required but the process in running sequentially for each file inside the FTP folder
I added currentThreadName in my Transformer Endpoint it was printing same thread name for each file
Here is what I have tried till now
1.task executor bean
#Bean
public TaskExecutor taskExecutor(){
return new SimpleAsyncTaskExecutor("Integration");
}
2.Integration flow
#Bean
public IntegrationFlow integrationFlow(JobLaunchingGateway jobLaunchingGateway) throws IOException {
return IntegrationFlows.from(Ftp.inboundAdapter(myFtpSessionFactory)
.remoteDirectory("/bar")
.localDirectory(localDir.getFile())
,c -> c.poller(Pollers.fixedRate(1000).taskExecutor(taskExecutor()).maxMessagesPerPoll(20)))
.transform(fileMessageToJobRequest(importUserJob(step1())))
.handle(jobLaunchingGateway)
.log(LoggingHandler.Level.WARN, "headers.id + ': ' + payload")
.route(JobExecution.class,j->j.getStatus().isUnsuccessful()?"jobFailedChannel":"jobSuccessfulChannel")
.get();
}
3.I also read in another SO thread that I need ExecutorChannel so I configured one but I don't know how to inject this channel into my Ftp.inboundAdapter, from logs is see that the channel is always integrationFlow.channel#0 which I guess is a DirectChannel
#Bean
public MessageChannel inputChannel() {
return new ExecutorChannel(taskExecutor());
}
I dont know what I'm missing here, or I might have not properly understood Spring Messaging System as I'm very much new to Spring and Spring-Integration
Any help is appreciated
Thanks

The ExecutorChannel you can simply inject into the flow and it is going to be applied to the SourcePollingChannelAdapter by the framework. So, having that inputChannel defined as a bean you just do this:
.channel(inputChannel())
before your .transform(fileMessageToJobRequest(importUserJob(step1()))).
See more in docs: https://docs.spring.io/spring-integration/docs/current/reference/html/dsl.html#java-dsl-channels
On the other hand to process your files in parallel according your .taskExecutor(taskExecutor()) configuration, you just need to have a .maxMessagesPerPoll(20) as 1. The logic in the AbstractPollingEndpoint is like this:
this.taskExecutor.execute(() -> {
int count = 0;
while (this.initialized && (this.maxMessagesPerPoll <= 0 || count < this.maxMessagesPerPoll)) {
if (pollForMessage() == null) {
break;
}
count++;
}
So, we do have tasks in parallel, but only when they reach that maxMessagesPerPoll where it is 20 in your current case. There is also some explanation in the docs: https://docs.spring.io/spring-integration/docs/current/reference/html/messaging-endpoints.html#endpoint-pollingconsumer
The maxMessagesPerPoll property specifies the maximum number of messages to receive within a given poll operation. This means that the poller continues calling receive() without waiting, until either null is returned or the maximum value is reached. For example, if a poller has a ten-second interval trigger and a maxMessagesPerPoll setting of 25, and it is polling a channel that has 100 messages in its queue, all 100 messages can be retrieved within 40 seconds. It grabs 25, waits ten seconds, grabs the next 25, and so on.

How to handle errors after message has been handed off to QueueChannel?

I have 10 rabbitMQ queues, called event.q.0, event.q.2, <...>, event.q.9. Each of these queues receive messages routed from event.consistent-hash exchange. I want to build a fault tolerant solution that will consume messages for a specific event in sequential manner, since ordering is important. For this I have set up a flow that listens to those queues and routes messages based on event ID to a specific worker flow. Worker flows work based on queue channels so that should guarantee the FIFO order for an event with specific ID. I have come up with with the following set up:
#Bean
public IntegrationFlow eventConsumerFlow(RabbitTemplate rabbitTemplate, Advice retryAdvice) {
return IntegrationFlows
.from(
Amqp.inboundAdapter(new SimpleMessageListenerContainer(rabbitTemplate.getConnectionFactory()))
.configureContainer(c -> c
.adviceChain(retryAdvice())
.addQueueNames(queueNames)
.prefetchCount(amqpProperties.getPreMatch().getDefinition().getQueues().getEvent().getPrefetch())
)
.messageConverter(rabbitTemplate.getMessageConverter())
)
.<Event, String>route(e -> String.format("worker-input-%d", e.getId() % numberOfWorkers))
.get();
}
private Advice deadLetterAdvice() {
return RetryInterceptorBuilder
.stateless()
.maxAttempts(3)
.recoverer(recoverer())
.backOffPolicy(backOffPolicy())
.build();
}
private ExponentialBackOffPolicy backOffPolicy() {
ExponentialBackOffPolicy backOffPolicy = new ExponentialBackOffPolicy();
backOffPolicy.setInitialInterval(1000);
backOffPolicy.setMultiplier(3.0);
backOffPolicy.setMaxInterval(15000);
return backOffPolicy;
}
private MessageRecoverer recoverer() {
return new RepublishMessageRecoverer(
rabbitTemplate,
"error.exchange.dlx"
);
}
#PostConstruct
public void init() {
for (int i = 0; i < numberOfWorkers; i++) {
flowContext.registration(workerFlow(MessageChannels.queue(String.format("worker-input-%d", i), queueCapacity).get()))
.autoStartup(false)
.id(String.format("worker-flow-%d", i))
.register();
}
}
private IntegrationFlow workerFlow(QueueChannel channel) {
return IntegrationFlows
.from(channel)
.<Object, Class<?>>route(Object::getClass, m -> m
.resolutionRequired(true)
.defaultOutputToParentFlow()
.subFlowMapping(EventOne.class, s -> s.handle(oneHandler))
.subFlowMapping(EventTwo.class, s -> s.handle(anotherHandler))
)
.get();
}
Now, when lets say an error happens in eventConsumerFlow, the retry mechanism works as expected, but when an error happens in workerFlow, the retry doesn't work anymore and the message doesn't get sent to dead letter exchange. I assume this is because once message is handed off to QueueChannel, it gets acknowledged automatically. How can I make the retry mechanism work in workerFlow as well, so that if exception happens there, it could retry a couple of times and send a message to DLX when tries are exhausted?

If you want resiliency, you shouldn't be using queue channels at all; the messages will be acknowledged immediately after the message is put in the in-memory queue;if the server crashes, those messages will be lost.
You should configure a separate adapter for each queue if you want no message loss.
That said, to answer the general question, any errors on downstream flows (including after a queue channel) will be sent to the errorChannel defined on the inbound adapter.

ActiveMQ: Dispatched queue contains more messages then prefetch size

I have prefetch size set to 1 (jms.prefetchPolicy.all=1 in url). In web console I can see that prefetch is 1 for all of my consumers. One consumer got stuck and there were 67 messages on his dispatch queue -see my screenshot
Could you help me understand how could it happen? I've read plenty of articles on this and my understanding is that Dispatch queue size should be up to prefetch size?!
I use following configuration to consume messages from queue:
ConnectionFactory getActiveMQConnectionFactory() {
// Configure the ActiveMQConnectionFactory
ActiveMQConnectionFactory activeMQConnectionFactory = new ActiveMQConnectionFactory();
activeMQConnectionFactory.setBrokerURL(brokerUrl);
activeMQConnectionFactory.setUserName(user);
activeMQConnectionFactory.setPassword(password);
activeMQConnectionFactory.setNonBlockingRedelivery(true);
// Configure the redeliver policy and the dead letter queue
RedeliveryPolicy redeliveryPolicy = new RedeliveryPolicy();
redeliveryPolicy.setInitialRedeliveryDelay(initialRedeliveryDelay);
redeliveryPolicy.setRedeliveryDelay(redeliveryDelay);
redeliveryPolicy.setUseExponentialBackOff(useExponentialBackOff);
redeliveryPolicy.setMaximumRedeliveries(maximumRedeliveries);
RedeliveryPolicyMap redeliveryPolicyMap = activeMQConnectionFactory.getRedeliveryPolicyMap();
redeliveryPolicyMap.put(new ActiveMQQueue(thumbnailQueue), redeliveryPolicy);
activeMQConnectionFactory.setRedeliveryPolicy(redeliveryPolicy);
return activeMQConnectionFactory;
}
public IntegrationFlow createThumbnailFlow(String concurrency, CreateThumbnailReceiver receiver) {
return IntegrationFlows.from(
Jms.messageDrivenChannelAdapter(
Jms.container(getActiveMQConnectionFactory(), thumbnailQueue)
.concurrency(concurrency)
.sessionTransacted(true)
.get()
))
.transform(new JsonToObjectTransformer(CreateThumbnailRequest.class, jsonObjectMapper()))
.handle(receiver)
.get();
}

The problem was cause by difference between version of broker (5.14.5) and client (5.15.3). After upgrading broker dispatched queue contains at most 2 message as expected.

Event Hub receiver does not read all messages

I implemented 2 simple services in Service Fabric, which communicate over Event Hub and I encounter very strange behavior.
The listener service reads the messages using PartitionReciever with ReceiveAsync method. It reads the messages always from the start of the partition, but even though the maxMessageCount parameter is set to very high number, which definitely exceeds the number of messages in the partition, it reads only "random" amount of messages but never the full list. It always starts to read correctly from the beginning of the partition but it almost never reads the full list of messages which should be present there...
Did I miss something in documentation and this is normal behavior, or am I right, that this is very strange bahviour?
A code snippet of my receiver service:
PartitionReceiver receiver = eventHubClient.CreateReceiver(PartitionReceiver.DefaultConsumerGroupName, Convert.ToString(partition), PartitionReceiver.StartOfStream);
ServiceEventSource.Current.Write("RecieveStart");
IEnumerable<EventData> ehEvents = null;
int i = 0;
do
{
try
{
ehEvents = await receiver.ReceiveAsync(1000);
break;
}
catch (OperationCanceledException)
{
if (i == NUM_OF_RETRIES-1)
{
await eventHubClient.CloseAsync();
StatusCode(500);
}
}
i++;
} while (i < NUM_OF_RETRIES);

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Custom partition in pulsar - apache-pulsar

Related

Sending data in batches using spring integration?

Run Spring Integration flow concurrently for each Ftp file

How to handle errors after message has been handed off to QueueChannel?

ActiveMQ: Dispatched queue contains more messages then prefetch size

Event Hub receiver does not read all messages

Categories

Resources