I'm trying to understand the behavior of SqsMessageDrivenChannelAdapter to address memory issue.
The upstream system dumps thousands of messages in aws-sqs-queue, all of the messages are received immediately by SqsMessageDrivenChannelAdapter. On the AWS console I do not see any messages available on the queue.
The SqsMessageProcesser then processes 1 message every 5 seconds.
Here's the log:
2019-05-21 17:28:18 INFO SQSMessageProcessor:88 - --- inside
sqsMessageProcesser--- 2019-05-21 17:28:23 INFO
SQSMessageProcessor:88 - --- inside sqsMessageProcesser--- 2019-05-21
17:28:28 INFO SQSMessageProcessor:88 - --- inside
sqsMessageProcesser--- 2019-05-21 17:28:33 INFO
SQSMessageProcessor:88 - --- inside sqsMessageProcesser--- 2019-05-21
17:28:38 INFO SQSMessageProcessor:88 - --- inside
sqsMessageProcesser--- .........................
Does this mean that while SqsMessageProcesser is processing 1 message every 5 seconds, thousands of messages are being held in (server) memory of the in-channel?
Each db transaction takes around 5 seconds and currently we are facing 'outofmemory' issues on PRD.
Will it help if i set the capacity on the QueueChannel and setMaxNumberOfMessages for SqsMessageDrivenChannelAdapter?
If yes, is there a standard way to calculate these values?
#Bean(name = "in-channel")
public PollableChannel sqsInputChannel() {
return new QueueChannel();
}
#Autowired
private AmazonSQSAsync amazonSqs;
#Bean
public MessageProducer sqsMessageDrivenChannelAdapterForItems() {
SqsMessageDrivenChannelAdapter adapter =
new SqsMessageDrivenChannelAdapter(amazonSqs, "aws-sqs-queue");
adapter.setOutputChannelName("in-channel");
return adapter;
}
#ServiceActivator(inputChannel = "in-channel",
poller = #Poller(fixedRate = "5000" , maxMessagesPerPoll = "1"))
public void sqsMessageProcesser(Message<?> receive) throws ProcesserException {
logger.info("--- inside sqsMessageProcesser---")
// db transactions.
}
Actually it is an anti-pattern to place a QueueChannel for message-driven channel adapter. The later is already async and based on some task scheduling. So, shifting consumed messages from source into an in-memory queue is definitely leading into some troubles.
You should consider to have a direct channel instead and let SQS consuming thread to be blocked until your sqsMessageProcesser finishes its job. This way you will guarantee no data loss.
Related
I have prefetch size set to 1 (jms.prefetchPolicy.all=1 in url). In web console I can see that prefetch is 1 for all of my consumers. One consumer got stuck and there were 67 messages on his dispatch queue -see my screenshot
Could you help me understand how could it happen? I've read plenty of articles on this and my understanding is that Dispatch queue size should be up to prefetch size?!
I use following configuration to consume messages from queue:
ConnectionFactory getActiveMQConnectionFactory() {
// Configure the ActiveMQConnectionFactory
ActiveMQConnectionFactory activeMQConnectionFactory = new ActiveMQConnectionFactory();
activeMQConnectionFactory.setBrokerURL(brokerUrl);
activeMQConnectionFactory.setUserName(user);
activeMQConnectionFactory.setPassword(password);
activeMQConnectionFactory.setNonBlockingRedelivery(true);
// Configure the redeliver policy and the dead letter queue
RedeliveryPolicy redeliveryPolicy = new RedeliveryPolicy();
redeliveryPolicy.setInitialRedeliveryDelay(initialRedeliveryDelay);
redeliveryPolicy.setRedeliveryDelay(redeliveryDelay);
redeliveryPolicy.setUseExponentialBackOff(useExponentialBackOff);
redeliveryPolicy.setMaximumRedeliveries(maximumRedeliveries);
RedeliveryPolicyMap redeliveryPolicyMap = activeMQConnectionFactory.getRedeliveryPolicyMap();
redeliveryPolicyMap.put(new ActiveMQQueue(thumbnailQueue), redeliveryPolicy);
activeMQConnectionFactory.setRedeliveryPolicy(redeliveryPolicy);
return activeMQConnectionFactory;
}
public IntegrationFlow createThumbnailFlow(String concurrency, CreateThumbnailReceiver receiver) {
return IntegrationFlows.from(
Jms.messageDrivenChannelAdapter(
Jms.container(getActiveMQConnectionFactory(), thumbnailQueue)
.concurrency(concurrency)
.sessionTransacted(true)
.get()
))
.transform(new JsonToObjectTransformer(CreateThumbnailRequest.class, jsonObjectMapper()))
.handle(receiver)
.get();
}
The problem was cause by difference between version of broker (5.14.5) and client (5.15.3). After upgrading broker dispatched queue contains at most 2 message as expected.
I've following InboundChannelAdapter with Poller to process files every 30 seconds. The files are not large but I realize the memory consumptions keeps going up even when there's no files coming.
#Bean
#InboundChannelAdapter(value = "flowFileInChannel" ,poller = #Poller(fixedDelay ="30000", maxMessagesPerPoll = "1"))
public MessageSource<File> flowInboundFileAdapter(#Value("${integration.path}") File directory) {
FileReadingMessageSource source = new FileReadingMessageSource();
source.setDirectory(directory);
source.setFilter(flowPathFileFilter);
source.setUseWatchService(true);
source.setScanEachPoll(true);
source.setAutoCreateDirectory(false);
return source;
}
Is there an internal queue that is not cleared after each poll? How do I configure to avoid eating up memory.
After digging deeper, it looks like the below Spring IntegrationFlows which processes the data from the InboundChannelDapter is holding up the memory after each file polling. After I commenting out the middle part, the memory consumption seems stable (instead of increasing consumption). Now I'm wondering how do we force Spring IntegrationFlows to clear those Messages and Headers after they're passed through different channels (i.e. after the last channel below)
public IntegrationFlow incomingLocateFlow(){
return IntegrationFlows.from(locateIncomingChannel())
// .split("locateItemSplitter","split")
// .transform(locateItemEnrichmentTransformer)
// .transform(locateRequestTransformer)
// .aggregate(new Consumer<AggregatorSpec>() { // 32
//
// #Override
// public void accept(AggregatorSpec aggregatorSpec) {
// aggregatorSpec.processor(locateRequestProcessor, null); // 33
// }
//
// }, null)
// .transform(locateIncomingResultTransformer)
// .transform(locateExceptionReportWritingHandler)
.channel(locateIncomingCompleteChannel())
.get();
}
Indeed there is an AcceptOnceFileListFilter with the code like:
private final Queue<F> seen;
private final Set<F> seenSet = new HashSet<F>();
On each poll those internal collections are replenished with new files.
For this purpose you can consider to use FileSystemPersistentAcceptOnceFileListFilter with the persistent MetadataStore implementation to avoid memory consumption.
Also consider to use some tool to analyze the memory content. You might have something else downstream on the flowFileInChannel.
UPDATE
Since you use .aggregate() it is definitely the point where memory is consumed by default. That's because there is SimpleMessageStore to keep messages for grouping. Plus there is an option expireGroupsUponCompletion(boolean) which is false by default. Therefore even after successful releasing some info is still in the MessageStore. That's how your memory is consumed a bit from time to time.
That option is false by default to let to have logic when we discard late message for completed group. When it is true, you are able to form fresh group for the same correlationKey.
See more info about Aggregator in the Reference Manual.
I need to manually ack multiple messages in a rabbit listener only after they are successfully processed and stored. Spring boot configuration that is used is as following
listener:
concurrency: 2
max-concurrency: 20
acknowledge-mode: manual
prefetch: 30
The messages should be stored in batches of 20 at a time. Only when they are successfully stored, the multiple ack should be sent. There's also associated timeout with storage mechanism, which should store the messages after 20 seconds even if there's no 20 of them. Currently, I have the following code
#Slf4j
#Component
class EventListener {
#Autowired
private EventsStorage eventsStorage
private ConcurrentMap<Integer, ChannelData> channelEvents = new ConcurrentHashMap<>()
#RabbitListener(queues = 'event-queue')
void processEvent(#Payload Event event, Channel channel, #Header(DELIVERY_TAG) long tag) {
log.debug("Event received for channel $channel.channelNumber")
channelEvents.compute(channel.channelNumber, { k, channelData -> addEventAndStoreIfNeeded(channel, event, tag, channelData) })
}
private ChannelData addEventAndStoreIfNeeded(Channel channel, Event event, long tag, ChannelData channelData) {
if (channelData) {
channelData.addEvent(tag, event)
if (channelData.getDeliveredEvents().size() >= batchSize) {
storeAndAckChannelEvents(channel.channelNumber)
}
return channelData
} else {
ChannelData newChannelData = new ChannelData(channel)
newChannelData.addEvent(tag, event)
return newChannelData
}
}
void storeAndAckChannelEvents(Integer channelNumber) {
channelEvents.compute(channelNumber, { k, channelData ->
List<DeliveredEvent> deliveredEvents = channelData.deliveredEvents
if (!deliveredEvents.isEmpty()) {
def events = deliveredEvents.stream()
.map({ DeliveredEvent deliveredEvent -> deliveredEvent.event })
.collect(Collectors.toList())
eventsStorage.store(events)
long lastDeliveryTag = deliveredEvents.get(deliveredEvents.size() - 1).deliveryTag
channelData.channel.basicAck(lastDeliveryTag, true)
deliveredEvents.clear()
}
})
}
#Scheduled(fixedRate = 20000L)
void storeMessagingEvents() {
channelEvents.forEach({ k, channelData -> storeAndAckChannelEvents(channelData) })
}
}
where ChannelData and DeliveredEvent are as following
class DeliveredMesssagingEvent {
int deliveryTag
Event event
}
class ChannelData {
Channel channel
List<DeliveredEvent> deliveredEvents = new ArrayList<>()
ChannelData(Channel channel) {
this.channel = channel
}
void addEvent(long tag, Event event) {
deliveredEvents.add(new DeliveredEvent(deliveryTag: tag, event: event))
}
}
The Channel used is com.rabbitmq.client.Channel. The docs about this interface state:
Channel instances must not be shared between threads. Applications should prefer using a Channel per thread instead of sharing the same Channel across multiple threads.
So, I'm doing quite opposite, sharing Channel between Scheduler and SimpleMessageListenerContainer worker threads. The output of my application is like this:
[SimpleAsyncTaskExecutor-3] DEBUG EventListener - Event received for channel 2
[SimpleAsyncTaskExecutor-4] DEBUG EventListener - Event received for channel 3
[SimpleAsyncTaskExecutor-5] DEBUG EventListener - Event received for channel 1
[SimpleAsyncTaskExecutor-1] DEBUG EventListener - Event received for channel 5
[SimpleAsyncTaskExecutor-2] DEBUG EventListener - Event received for channel 4
[SimpleAsyncTaskExecutor-3] DEBUG EventListener - Event received for channel 2
[SimpleAsyncTaskExecutor-1] DEBUG EventListener - Event received for channel 5
[SimpleAsyncTaskExecutor-2] DEBUG EventListener - Event received for channel 4
[SimpleAsyncTaskExecutor-3] DEBUG EventListener - Event received for channel 2
[pool-4-thread-1] DEBUG EventListener - Storing channel 5 events
[pool-4-thread-1] DEBUG EventListener - Storing channel 2 events
...
SimpleMessageListenerContainer worker-threads have their own Channel which does not change over the time.
Taking into account that I synced Scheduler and SimpleMessageListenerContainer worker threads, does anyone see any reason why this code is not thread safe?
Is there any other approach that I should try to manually ack multiple messages in Spring boot?
You will be ok as long as you sync the threads.
Bear in mind, though, that if the connection is lost, you will get a new consumer and your sync thread will have stale data (the unack'd messages will be redelivered).
However, you could also use container idle events.
When a consumer thread has been idle for that time, the event is published on the same listener thread, so you could do the timed ack there and you wouldn't have to worry about synchronization.
You can also detect consumer failed events if the connection is lost.
I have an EventHub configured in Azure, also a consumer group for reading the data. It was working fine for some days. Suddenly, I see there is a delay in incoming data(around 3 days). I use Windows Service to consume data in my server. I have around 500 incoming messages per minute. Can anyone help me out to figure this out ?
It might be that you are processing them items too slow. Therefore the work to be done grows and you will lag behind.
To get some insight in where you are in the event stream you can use code like this:
private void LogProgressRecord(PartitionContext context)
{
if (namespaceManager == null)
return;
var currentSeqNo = context.Lease.SequenceNumber;
var lastSeqNo = namespaceManager.GetEventHubPartition(context.EventHubPath, context.ConsumerGroupName, context.Lease.PartitionId).EndSequenceNumber;
var delta = lastSeqNo - currentSeqNo;
logWriter.Write(
$"Last processed seqnr for partition {context.Lease.PartitionId}: {currentSeqNo} of {lastSeqNo} in consumergroup '{context.ConsumerGroupName}' (lag: {delta})",
EventLevel.Informational);
}
the namespaceManager is build like this:
namespaceManager = NamespaceManager.CreateFromConnectionString("Endpoint=sb://xxx.servicebus.windows.net/;SharedAccessKeyName=yyy;SharedAccessKey=zzz");
I call this logging method in the CloseAsync method:
public Task CloseAsync(PartitionContext context, CloseReason reason)
{
LogProgressRecord(context);
return Task.CompletedTask;
}
logWriter is just some logging class I have used to write info to blob storage.
It now outputs messages like
Last processed seqnr for partition 3: 32780931 of 32823804 in consumergroup 'telemetry' (lag: 42873)
so when the lag is very high you could be processing events that have occurred a long time ago. In that case you need to scale up/out your processor.
If you notice a lag you should measure how long it takes to process a given number of item. You can then try to optimize performance and see whether it improves. We did it like:
public async Task ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> events)
{
try
{
stopwatch.Restart();
// process items here
stopwatch.Stop();
await CheckPointAsync(context);
logWriter.Write(
$"Processed {events.Count()} events in {stopwatch.ElapsedMilliseconds}ms using partition {context.Lease.PartitionId} in consumergroup {context.ConsumerGroupName}.",
EventLevel.Informational);
}
}
I have got a Worker Role running in azure.
This worker processes a queue in which there are a large number of integers. For each integer I have to do processings quite long (from 1 second to 10 minutes according to the integer).
As this is quite time consuming, I would like to do these processings in parallel. Unfortunately, my parallelization seems to not be efficient when I test with a queue of 400 integers.
Here is my implementation :
public class WorkerRole : RoleEntryPoint {
private readonly CancellationTokenSource cancellationTokenSource = new CancellationTokenSource();
private readonly ManualResetEvent runCompleteEvent = new ManualResetEvent(false);
private readonly Manager _manager = Manager.Instance;
private static readonly LogManager logger = LogManager.Instance;
public override void Run() {
logger.Info("Worker is running");
try {
this.RunAsync(this.cancellationTokenSource.Token).Wait();
}
catch (Exception e) {
logger.Error(e, 0, "Error Run Worker: " + e);
}
finally {
this.runCompleteEvent.Set();
}
}
public override bool OnStart() {
bool result = base.OnStart();
logger.Info("Worker has been started");
return result;
}
public override void OnStop() {
logger.Info("Worker is stopping");
this.cancellationTokenSource.Cancel();
this.runCompleteEvent.WaitOne();
base.OnStop();
logger.Info("Worker has stopped");
}
private async Task RunAsync(CancellationToken cancellationToken) {
while (!cancellationToken.IsCancellationRequested) {
try {
_manager.ProcessQueue();
}
catch (Exception e) {
logger.Error(e, 0, "Error RunAsync Worker: " + e);
}
}
await Task.Delay(1000, cancellationToken);
}
}
}
And the implementation of the ProcessQueue:
public void ProcessQueue() {
try {
_queue.FetchAttributes();
int? cachedMessageCount = _queue.ApproximateMessageCount;
if (cachedMessageCount != null && cachedMessageCount > 0) {
var listEntries = new List<CloudQueueMessage>();
listEntries.AddRange(_queue.GetMessages(MAX_ENTRIES));
Parallel.ForEach(listEntries, ProcessEntry);
}
}
catch (Exception e) {
logger.Error(e, 0, "Error ProcessQueue: " + e);
}
}
And ProcessEntry
private void ProcessEntry(CloudQueueMessage entry) {
try {
int id = Convert.ToInt32(entry.AsString);
Service.GetData(id);
_queue.DeleteMessage(entry);
}
catch (Exception e) {
_queueError.AddMessage(entry);
_queue.DeleteMessage(entry);
logger.Error(e, 0, "Error ProcessEntry: " + e);
}
}
In the ProcessQueue function, I try with different values of MAX_ENTRIES: first =20 and then =2.
It seems to be slower with MAX_ENTRIES=20, but whatever the value of MAX_ENTRIES is, it seems quite slow.
My VM is a A2 medium.
I really don't know if I do the parallelization correctly ; maybe the problem comes from the worker itself (which may be it is hard to have this in parallel).
You haven't mentioned which Azure Messaging Queuing technology you are using, however for tasks where I want to process multiple messages in parallel I tend to use the Message Pump Pattern on Service Bus Queues and Subscriptions, leveraging the OnMessage() method available on both Service Bus Queue and Subscription Clients:
QueueClient OnMessage() - https://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.queueclient.onmessage.aspx
SubscriptionClient OnMessage() - https://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.subscriptionclient.onmessage.aspx
An overview of how this stuff works :-) - http://fabriccontroller.net/blog/posts/introducing-the-event-driven-message-programming-model-for-the-windows-azure-service-bus/
From MSDN:
When calling OnMessage(), the client starts an internal message pump
that constantly polls the queue or subscription. This message pump
consists of an infinite loop that issues a Receive() call. If the call
times out, it issues the next Receive() call.
This pattern allows you to use a delegate (or anonymous function in my preferred case) that handles the receipt of the Brokered Message instance on a separate thread on the WaWorkerHost process. In fact, to increase the level of throughput, you can specify the number of threads that the Message Pump should provide, thereby allowing you to receive and process 2, 4, 8 messages from the queue in parallel. You can additionally tell the Message Pump to automagically mark the message as complete when the delegate has successfully finished processing the message. Both the thread count and AutoComplete instructions are passed in the OnMessageOptions parameter on the overloaded method.
public override void Run()
{
var onMessageOptions = new OnMessageOptions()
{
AutoComplete = true, // Message-Pump will call Complete on messages after the callback has completed processing.
MaxConcurrentCalls = 2 // Max number of threads the Message-Pump can spawn to process messages.
};
sbQueueClient.OnMessage((brokeredMessage) =>
{
// Process the Brokered Message Instance here
}, onMessageOptions);
RunAsync(_cancellationTokenSource.Token).Wait();
}
You can still leverage the RunAsync() method to perform additional tasks on the main Worker Role thread if required.
Finally, I would also recommend that you look at scaling your Worker Role instances out to a minimum of 2 (for fault tolerance and redundancy) to increase your overall throughput. From what I have seen with multiple production deployments of this pattern, OnMessage() performs perfectly when multiple Worker Role Instances are running.
A few things to consider here:
Are your individual tasks CPU intensive? If so, parallelism may not help. However, if they are mostly waiting on data processing tasks to be processed by other resources, parallelizing is a good idea.
If parallelizing is a good idea, consider not using Parallel.ForEach for queue processing. Parallel.Foreach has two issues that prevent you from being very optimal:
The code will wait until all kicked off threads finish processing before moving on. So, if you have 5 threads that need 10 seconds each and 1 thread that needs 10 minutes, the overall processing time for Parallel.Foreach will be 10 minutes.
Even though you are assuming that all of the threads will start processing at the same time, Parallel.Foreach does not work this way. It looks at number of cores on your server and other parameters and generally only kicks off number of threads it thinks it can handle, without knowing too much about what's in those threads. So, if you have a lot of non-CPU bound threads that /can/ be kicked off at the same time without causing CPU over-utilization, default behaviour will not likely run them optimally.
How to do this optimally:
I am sure there are a ton of solutions out there, but for reference, the way we've architected it in CloudMonix (that must kick off hundreds of independent threads and complete them as fast as possible) is by using ThreadPool.QueueUserWorkItem and manually keeping track number of threads that are running.
Basically, we use a Thread-safe collection to keep track of running threads that are started by ThreadPool.QueueUserWorkItem. Once threads complete, remove them from that collection. The queue-monitoring loop is indendent of executing logic in that collection. Queue-monitoring logic gets messages from the queue if the processing collection is not full up to the limit that you find most optimal. If there is space in the collection, it tries to pickup more messages from the queue, adds them to the collection and kick-start them via ThreadPool.QueueUserWorkItem. When processing completes, it kicks off a delegate that cleans up thread from the collection.
Hope this helps and makes sense