spring integration: Context closes while messages are being processed

spring integration: Context closes while messages are being processed - spring-integration

I have a main class that reads records from DB and passes to a spring gateway(Spring integration), from there on I split the messages into a multithreaded application with queues.
My Sping application is closing the context while the messages are still being processed by spring integration adapters.
Need a mechanism to only close the context after all of the messages get processed.
As a temporary workaround, I am using thread.sleep in a while loop to keep the program alive

Well, I would suggest to have a bean like:
Semaphore queueSemaphore = new Semaphore(1);
And acquire() it in the beginning of your application. Do the same in the end before closing application context. In this case the second acquire() will wait until release() somewhere in your application when you realize that all the messages have been just processed.
Another trick is via standard System.exit(0) in the point you see that all the messages are processed. But in this case you should not close application context manually.

Related

Thread Sleep in the Kafka Listener

I am trying to pause/resume the Kafka container. Using the following code snippet to do so:
kafkaListenerEndpointRegistry.getListenerContainer("MAIN").pause();
When I call the pause, I also need to do a thread.sleep so that messages in the batch are not processed. For every message in the batch, I am calling another API which has a rate limit. To maintain this rate limit, I need to stop the processing for the message.
If the Main thread sleeps, will it stop Listener from sending the hearbeat? Does it also stop the heartbeat thread in the background?
Documentation says , "When a container is paused, it continues to poll() the consumer, avoiding a rebalance if group management is being used, but it does not retrieve any records. "
But I am pausing the container and making the thread sleep. How will this impact the flow?

You must never sleep the consumer thread, to avoid rebalancing.
Instead, reduce the max.poll.records so the pause will take effect more quickly (the consumer won't actually pause until the records received by the previous poll are processed).
You can throw an exception after pausing the consumer, but you will need to resume the container somehow.
I opened a new issue to improve this behavior https://github.com/spring-projects/spring-kafka/issues/2280
If you are subject to rate limits, consider using KafkaTemplate.receive() methods, on a schedule, or a polled Spring Integration adapter, instead of using a message-driven approach.

Amazon SQS better way of handling listeners

I have an SQS Queue which has a lot of messages (typically in thousands). Presently I am having multiple listeners (which are created by threads created from the same source) and each listener listens to the queue and receives messages. As soon as a listener receives a message from the Queue, that listener deletes the message from the Queue. The message will be processed only after deleting the message from the queue. I am having a visibility timeout of 30 seconds.
I am not using any locks or anything to handle duplicates since I am deleting the message from the queue as soon as after receiving. I haven't seen a case of duplicity until now but I am just worried it might.
Now, the question is, which is a better way, having multiple listeners this way or listening to the queue in a single thread, and then spinning up new threads to process each message you receive?

Firstly, it is worth understanding the concept of message invisibility timeout.
When a message is retrieved from an Amazon SQS queue (eg by your thread), the message is marked as invisible in Amazon SQS. Best-practice is for your thread to then process the message and then delete the message after it has completed processing the message. This way, if the thread fails, the message will automatically become visible on the queue again and another thread can process it.
With your current application design, if a thread fails then the message is lost and will not be retried. You should consider changing your code to delete the message only after it has been processed.
Using multiple threads to process messages is recommended, because it will allow higher message throughput by processing messages in parallel. It is also a simpler design, and simple is always best. Your alternate idea of having one process retrieve messages and then firing off threads to process the message is more complex and does not provide any benefits.
Amazon SQS queues can occasionally return the same message more than once. It is rare, but can happen. The multiple-thread design will probably result in it happening more than the single-thread design because multiple threads might simultaneously retrieve the same message. However, there it could still happen in the single-thread model, too.
If processing the same message twice is a concern, then consider using a FIFO queue (not currently available in every AWS Region). This will guarantee that every message is received only once. Alternatively, your code would need to check whether a particular message has already been processed (eg by checking in a database).
The multiple-thread design will also allow you to horizontally scale by having multiple system (even across multiple Availability Zones) process messages, whereas your single-thread design has a single point of failure and is less scalable.

ServiceStack MQ server shutdown does not wait for worker background threads to complete

I'm using ServiceStack MQ (ServiceStack.Aws.Sqs.SqsMqServer v4.0.54).
I'm running MQ server inside a Windows Service.
My Goal:
When the Windows service is about to shutdown, I would like to
wait for all running workers to finish processing and then terminate
the MqServer.
Problem:
The ServiceStack MqServer (whether it's Redis/RabbitMq/Sqs) has a Stop() method. But it does not block until all workers complete their work. It merely
pulses the background thread to stop the workers and then it returns.
Then the Windows Service process stops, and existing workers get aborted.
This is the link to github source code -> https://github.com/ServiceStack/ServiceStack/blob/75847c737f9c0cd9f5dd4ea3ae1113dace56cbf2/src/ServiceStack.RabbitMq/RabbitMqServer.cs#L451
Temporary Workaround:
I subclass SqsMqServer, loop through the protected member 'workers' in the base class, and call Stop on each one. (in this case, this Stop() method is implemented correctly as a blocking call. It waits indefinitely until the worker is done with whatever it's currently working on).
Is my current understanding of how to shutdown the MqServer correct? Is this a bug or something I misunderstood.

The source code for SqsMqServer is maintained in the ServiceStack.Aws repository.
The Stop() method pulses the bg thread which StopWorkerThreads() and that goes through and stops all workers.

NServiceBus and Azure long running handler pattern

We are using Azure service bus via NServiceBus and I am facing a problem with deciding the correct architecture for dealing with long running tasks as a result of messages.
As is good practice, we don't want to block the message handler from returning by making it wait for long running processes (downloading a large file from a remote server), and actually doing so will cause the lock on the message to be lost with Azure SB. The plan is to respond by spawning a separate task and allow the message handler to return immediately.
However this means that the handler is now immediately available for the next message which will cause another task to be spawned and so on until the message queue is empty. What I'd like is some way to stop taking messages while we are processing (a limited number of) earlier messages. Is there an accepted pattern for this with NServiceBus and Azure Service Bus?
The following is what I'd kind of do if I was programming directly against the Azure SB
{
while(true)
{
var message = bus.Next();
message.Complete();
// Do long running stuff here
}
}
The verbs Next and Complete are probably wrong but what happens under Azure is that Next gets a temporary lock on the message so that other consumers can no longer see the message. Then you can decide if you really want to process the message and if so call Complete. That then removes the message from the queue entirely, failing to do so will cause the message to appear back on the queue after a period of time as Azure assumes you crashed. As dirty as this code looks it would achieve my goals (so why not do it?) as my consumer is only going to consume the next time I'm available (after the long running task). Other consumers (other instances) can jump in if necessary.
The problem is that NServiceBus adds a level of abstraction so that now handling a message is via a method on a handler class.
void Handle(NewFileMessage message)
{
// Do work here
}
The problem is that Azure does not get the call to message.Complete() until after your work and after the Handle method exits. This is why you need to keep the work short. However if you exit you also signal that you are ready to handle another message. This is my Catch 22

Downloading on a background thread is a good idea. You don't want to to increase lock duration, because that's a symptom, not the problem. Your download can easily get longer than maximum lock duration (5mins) and then you're back to square one.
What you can do is have an orchestrating saga for download. Saga can monitor the download process and when download is completed, b/g process would signal to the saga about completion. If download is never finished, you can have a timeout (or multiple timeouts) to indicate that and have a compensating action or retry, whatever works for your business case.
Documentation on Sagas should get you going: http://docs.particular.net/nservicebus/sagas/

In Azure Service Bus you can increase the lock duration of a message (default set to 30 seconds) in case the handling will take a long time.
But, besides you are able to increase your lock duration, it's generally an indication that your handler takes care of to much work which can be divided over different handlers.

If it is critical that the file is downloaded, I would keep the download operation in the handler. That way if the download fails the message can be handled again and the download can be retried. If however you want to free up the handler instantly to handle more messages, I would suggest that you scale out the workers that perform the download task so that the system can cope with the demand.

Concurrent message processing in RabbitMQ consumer

I am new to RabbitMQ so please excuse me if my question sound trivial. I want to publish message on RabbitMQ which will be processed by RabbitMQ consumer.
My consumer machine is a multi core machine (preferably worker role on azure). But QueueBasicConsumer pushes one message at a time. How can I program to utilize all core where I can process multiple message concurrently.
One solution could be to open multiple channels in multiple threads and then process message over there. But in this case how will i decide the number of threads.
Another approach could be to read message on main thread and then create task and pass message to this task. In this case I will have to stop consuming messages in case there are many message (after a threshold) already in progress. Not sure how could this be implemented.
Thanks In Advance

Your second option sounds much more reasonable - consume on a single channel and spawn multiple tasks to handle the messages. To implement concurrency control, you could use a semaphore to control the number of tasks in flight. Before starting a task, you would wait for the semaphore to become available, and after a task has finished, it would signal the semaphore to allow other tasks to run.
You haven't specified you language/technology stack of choice, but whatever you do - try to utilise a thread pool instead of creating and managing threads yourself. In .NET, that would mean using Task.Run to process messages asynchronously.
Example C# code:
using (var semaphore = new SemaphoreSlim(MaxMessages))
{
while (true)
{
var args = (BasicDeliverEventArgs)consumer.Queue.Dequeue();
semaphore.Wait();
Task.Run(() => ProcessMessage(args))
.ContinueWith(() => semaphore.Release());
}
}
Instead of controlling the concurrency level yourself, you might find it easier to enable explicit ACK control on the channel, and use RabbitMQ Consumer Prefetch to set the maximum number of unacknowledged messages. This way, you will never receive more messages than you wanted at once.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string