We have an application which can consume around 300 JMS messages per minute. We need to increase the speed to 3000 messages per minute.
I created a simple test program which reads the messages from the queue and logs the messages. No processing is involved, so I expected a high speed. However, the logging is still happening at a speed of around 400 messages per minute.
Below are the excerpts of my program
<int-jms:message-driven-channel-adapter id="testJmsInboundAdapter"
auto-startup="true"
destination="testQueueDestination"
connection-factory="testConnectionFactory"
channel="messageTransformerChannel" />
<int:channel id="messageTransformerChannel" />
<int:service-activator
id="loggerActivator"
input-channel="messageTransformerChannel"
method="log"
ref="logger" />
The logger method simply logs the message
public void log(final GenericMessage<Object> object) {
LOGGER.info("Logging message" + object);
}
Any advise where should I look at the bottleneck. Is there any limitation on the number of messages that can be consumed per minute using spring integration's message-driven-channel-adapter
Pay attention to these options:
<xsd:attribute name="concurrent-consumers" type="xsd:string">
<xsd:annotation>
<xsd:documentation>
Specify the number of concurrent consumers to create. Default is 1.
Specifying a higher value for this setting will increase the standard
level of scheduled concurrent consumers at runtime: This is effectively
the minimum number of concurrent consumers which will be scheduled
at any given time. This is a static setting; for dynamic scaling,
consider specifying the "maxConcurrentConsumers" setting instead.
Raising the number of concurrent consumers is recommendable in order
to scale the consumption of messages coming in from a queue. However,
note that any ordering guarantees are lost once multiple consumers are
registered
</xsd:documentation>
</xsd:annotation>
</xsd:attribute>
<xsd:attribute name="max-concurrent-consumers" type="xsd:string">
<xsd:annotation>
<xsd:documentation>
Specify the maximum number of concurrent consumers to create. Default is 1.
If this setting is higher than "concurrentConsumers", the listener container
will dynamically schedule new consumers at runtime, provided that enough
incoming messages are encountered. Once the load goes down again, the number of
consumers will be reduced to the standard level ("concurrentConsumers") again.
Raising the number of concurrent consumers is recommendable in order
to scale the consumption of messages coming in from a queue. However,
note that any ordering guarantees are lost once multiple consumers are
registered.
</xsd:documentation>
</xsd:annotation>
</xsd:attribute>
Related
I have an Event Hub and Azure Function connected to it. With small amounts of data all works well, but when I tested it with 10 000 events, I got very peculiar results.
For test purposes I send into Event hub numbers from 0 to 9999 and log data in application insights and in service bus. For the first test I see in Azure that hub got exactly 10 000 events, but service bus and AI got all messages between 0 and 4500, and every second message after 4500 (so it lost about 30%). In second test, I got all messages from 0 to 9999, but every second message between 3500 and 3200 was duplicated. I would like to get all messages once, what did I do wrong?
public async Task Run([EventHubTrigger("%EventHubName%", Connection = "AzureEventHubConnectionString")] EventData[] events, ILogger log)
{
int id = _random.Next(1, 100000);
_context.Log.TraceInfo("Started. Count: " + events.Length + ". " + id); //AI log
foreach (var message in events)
{
//log with ASB
var mess = new Message();
mess.Body = message.EventBody.ToArray();
await queueClient.SendAsync(mess);
}
_context.Log.TraceInfo("Completed. " + id); //AI log
}
By using EventData[] events, you are reading events from hub in batch mode, thats why you see X events processing at a time then next seconds you process next batch.
Instead of EventData[] use simply EventData.
When you send events to hub check that all events are sent with the same partition key if you want try batch processing otherwise they can be splitted in several partitions depending on TU (throughput units), PU (Processing Units) and CU (Capacity Units).
Egress: Up to 2 MB per second or 4096 events per second.
Refer to this document.
Throughput limits for Basic, Standard, Premium..:
There are a couple of things likely happening, though I can only speculate with the limited context that we have. Knowing more about the testing methodology, tier of your Event Hubs namespace, and the number of partitions in your Event Hub would help.
The first thing to be aware of is that the timing between when an event is published and when it is available in a partition to be read is non-deterministic. When a publish operation completes, the Event Hubs broker has acknowledged receipt of the events and taken responsibility for ensuring they are persisted to multiple replicas and made available in a specific partition. However, it is not a guarantee that the event can immediately be read.
Depending on how you sent the events, the broker may also need to route events from a gateway by performing a round-robin or applying a hash algorithm. If you're looking to optimize the time from publish to availability, taking ownership of partition distribution and publishing directly to a partition can help, as can ensuring that you're publishing with the right degree of concurrency for your host environment and scenario.
With respect to duplication, it's important to be aware that Event Hubs offers an "at least once" guarantee; your consuming application should expect some duplicates and needs to be able to handle them in the way that is appropriate for your application scenario.
Azure Functions uses a set of event processors in its infrastructure to read events. The processors collaborate with one another to share work and distribute the responsibility for partitions between them. Because collaboration takes place using storage as an intermediary to synchronize, there is an overlap of partition ownership when instances are scaled up or scaled down, during which time the potential for duplication is increased.
Functions makes the decision to scale based on the number of events that it sees waiting in partitions to be read. In the case of your test, if your publication pattern increases rapidly and Functions sees "the event backlog" grow to the point that it feels the need to scale by multiple instances, you'll see more duplication than you otherwise would for a period of 10-30 seconds until partition ownership normalizes. To mitigate this, using an approach of gradually increasing speed of publishing over a 1-2 minute period can help to smooth out the scaling and reduce (but not eliminate) duplication.
So I've been playing around with MassTransit and Azure Service Bus Premium, here's a sample of one of my consumers. Hypothetical initial load for one publisher would be about 1000 messages a second. However whenever I attempt to configure a consumer, it seems to generally average out at about 20-40 messages per loop.
cfg.ReceiveEndpoint("ReceivePoint", e =>{
e.PrefetchCount = 500;
e.MaxConcurrentCalls = 20;
e.Batch<IBlahContract>(b => {
b.MessageLimit = 500;
b.TimeLimit = TimeSpan.FromSeconds(1);
b.Consumer(() => new BatchBlahConsumer(provider.GetRequiredService<IRepository>(), provider.GetRequiredService<ILogger<BatchBlahConsumer>>()));
});
});
I did try Throughput test which managed a thousand plus messages a second. Did anyone get any tips on how to achieve optimal performance? And might it make more sense to consider a managed instance of RabbitMq since this needs to scale? It just feels like Azure Service Bus isn't really suited to such high throughput?
Edit: Slight addition to this, suspect it's related to a requirement to keep prefetch to about 20 and then consumer concurrency is what really defines performance. So basically, it needs consumer level configuration in terms of estimated requirements. Which would make me lean more towards using rabbit.
Your batch message limit is 500, which is honestly way too high. With the MaxConcurrentCalls set at 20, you'll always hit the timeout instead of the batch size limit, because the Azure client library will only ever deliver 20 messages at once, and the batch size is significantly higher than that value (500 vs 20). You need to set it high enough that it can complete a batch or you'll always be completing the batch on timeout alone.
Lower the batch size, and increase the MaxConncurrentCalls, so that they are the same, or at least so the batch size is less than the concurrent calls limit, so that batches can be completed upon message receipt instead of waiting to time out.
I am reading a message , transforming it and outputting on the JMS channel. The JMS channel uses a WorkManager Task Executor to read the messages and processes it.
Even though we configured the WorkManager in application server to have 10 threads, only one thread is being used.
<si:chain id="prenotifchain" input-channel="preNotificationChannel" output-channel="notificationJMSChannel">
<si:transformer id="prenotif" method="transformRequest" ref="notificationTransformer"/>
</si:chain>
<si-jms:channel id="notificationJMSChannel" queue="notificationQueue" connection-factory="queueConnectionFactory" transaction-manager="txManager" task-executor="notificationTaskExecutor" />
<jee:jndi-lookup id="notificationQueue" jndi-name="jms/notifqueue"/>
<bean id="notificationTaskExecutor"
class="org.springframework.scheduling.commonj.WorkManagerTaskExecutor">
<property name="workManagerName" value="notifWM" />
<property name="resourceRef" value="true" />
</bean>
Are we missing any configuration or is there another way to read multiple ?
Please, use concurrency attribute:
<xsd:attribute name="concurrency" type="xsd:string">
<xsd:annotation>
<xsd:documentation><![CDATA[
The number of concurrent sessions/consumers to start for each listener.
Can either be a simple number indicating the maximum number (e.g. "5")
or a range indicating the lower as well as the upper limit (e.g. "3-5").
Note that a specified minimum is just a hint and might be ignored at runtime.
Default is 1; keep concurrency limited to 1 in case of a topic listener
or if message ordering is important; consider raising it for general queues.
]]></xsd:documentation>
</xsd:annotation>
</xsd:attribute>
I'm running a Kafka Streams application with three sub-topologies. The stages of activity are roughly as follows:
stream Topic A
selectKey and repartition Topic A to Topic B
stream Topic B
foreach Topic B to Topic C Producer
stream Topic C
Topic C to Topic D
Topics A, B, and C are each materialized, which means that if each topic has 40 partitions, my maximum parallelism is 120.
At first I was running 5 streams applications with 8 threads a piece. With this set up I was experiencing inconsistent performance. It seems like some sub-topologies sharing the same thread were hungrier for CPU than others and after a while, I'd get this error: Member [client_id] in group [consumer_group] has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator). Everything would get rebalanced, which could lead to decreased performance until the next failure and rebalance.
My questions are as follows:
How is it that multiple sub-topologies are able to be run on a single thread? A poll queue?
How does each thread decide how to allocate compute resources to each of its sub-topologies?
How do you optimize your thread to topic-partition ratio in such cases to avoid periodic consumer failures? e.g., will a 1:1 ratio ensure more consistent performance?
If you use a 1:1 ratio, how do you ensure that every thread gets assigned its own topic-partition and some threads aren't left idle?
The thread will poll() for all topics of different sub-topologies and check the records topic metadata to feed it into the correct task.
Each sub-topology is treated the same, ie, available resources are evenly distributed if you wish.
A 1:1 ratio is only useful if you have enough cores. I would recommend to monitor your CPU utilization. If it's too high (larger >80%) you should add more cores/threads.
Kafka Streams handles this for you automatically.
Couple of general comments:
you might consider to increase max.poll.interval.ms config to avoid that a consumer drops out of the group
you might consider to decrease max.poll.records to get less records per poll() call, and thus decrease the time between two consecutive calls to poll().
note, that max.poll.records does not imply increases network/broker communication -- if a single fetch request return more records than max.poll.records config, the data is just buffered within the consumer and the next poll() will be served from the buffered data avoiding a broker round trip
I have a scenario below and is currently leverage Spring integration as the technology to achieve.
I have around 18000 staff Id data
for each staff, a process needs to kick off to do 1 HTTP call to retrieve staff profile information from mail calender server, then 1 HTTP call to retrieve some other information, then may need to send out 3-5 more HTTP calls in a single task
I need to finish this process for above 50000 staff in 15 mins.
I will need this whole batch process to run every 15mins again and again.
Assume each job takes 5 seconds to finish.. i still need 30 mins to finish
=================
Inital Thinking
I can use spring integration to have something like:
- create one job for each staff - 18000 jobs. The job request likely only contains a staff ID so request is very light weight.
- add all the jobs to the int:queue at once so it triggers the input channel - calenderSynRequestChannel
- have a poller - 100 concurrent workers to clean up the job in 15 mins.
Questions:
it is a good way to do this kind of batch processing? some concerns i have is the size of the queue to support 18000 jobs at once
should I use file base approach to store all the staff id in multiple files and get picked up later by the poller? however, this will also complicate the design as there could have concurrent issue for read/write/delete the files by the workers.
Current solution:
<int:service-activator ref="synCalenderService" method="synCalender" input-channel="calenderSynRequestChannel">
<int:poller fixed-delay="50" time-unit="MILLISECONDS" task-executor="taskExecutor" receive-timeout="0" />
</int:service-activator>
<task:executor id="taskExecutor" pool-size="50" keep-alive="120" queue-capacity="500"/>
Anyone encounters similar problem might give a bit of insight on how to address using Spring Integration
Why not do a spring batch job that:
Reader that reads the staff data
Processor that make the HTTP calls
Writer that writes the result to a logfile (for example)
Then utilize the TaskScheduler (spring batch framework) to schedule execution for every 15 minutes, or maybe even better with a fixed delay.
If you want to do it more in parallel, utilize the org.springframework.batch.integration.async.AsyncItemProcessor (and writer).