I have a scenario where:
1: there is a reader process and a writer process, these processes are communicating through SysV message queue.
2:Writer process is faster than reader process, that is, writer process writes messages in the queue faster than reader process reads a message and empty the queue, for example if I have 8 messages in the queue (single message queue) and reader process is yet to read one message at that time writer process trying to write (msgsnd) the 9th message in the queue.
3: What will happen will any of my message will get overwritten ?
4: or my last or first message in the queue will get overwritten ?
5: or the entire queue will be overwritten ?
6: or else the 9th message will be lost ?
7: How can I make sure that none of these scenario happens and I will not loose any new incoming message and no existing messages will get overwritten ?
8: How can I handle this situation ?
Regards
About point 3, the manpage of msgsnd says
When msgsnd() fails, errno will be set to one among the following values:
...
EAGAIN The message can't be sent due to the msg_qbytes limit for the queue
and IPC_NOWAIT was specified in msgflg.
therefore you won't be able to add another message to the queue and you will need to store them somewhere else. If you specified IPC_NOWAIT when you opened the queue, then the message will be lost.
Related
I have created a queue in Azure Queue and enqueued two items in it. Using the nodejs sdk, i create a timer that executes every 5 secs and calls:
azure.createQueueService("precondevqueues", "<key>").getMessages(queueName, {numOfMessages : 1, visibilityTimeout: 1 }, callback)
I expect that the same message of the two in the queue to show up after every 5 secs but that does not seem to be the case. The output of this call alternates between the two messages.
This should not be the case since visibilityTimeout is set to 1 and hence, after 1 second, the message dequeued in the first call should be visible again before the next getMessage call is made.
As noted here, FIFO ordering is not guaranteed. So it may be the case, that most of the time messages are fetched in FIFO order, but that is not guaranteed and Azure can give you the messages in the order which is best for their implementation.
Messages are generally added to the end of the queue and retrieved
from the front of the queue, although first in, first out (FIFO)
behavior is not guaranteed.
Aha my mistake! I again read the getMessages documentation very carefully and realize that getMessages dequeues the message but retains a invisible copy outside of the queue. If the message processor does not delete the message before the visibility timeout expires, the copy is re-enqueued in the message and therefore they go to the end of the queue.
I need to limit the rate of consuming messages from rabbitmq queue.
I have found many suggestions, but most of them offer to use prefetch option. But this option doesn't do what I need. Even if I set prefetch to 1 the rate is about 6000 messages/sec. This is too many for consumer.
I need to limit for example about 70 to 200 messages per second. This means consuming one message every 5-14ms. No simultaneous messages.
I'm using Node.JS with amqp.node library.
Implementing a token bucket might help:
https://en.wikipedia.org/wiki/Token_bucket
You can write a producer that produces to the "token bucket queue" at a fixed rate with a TTL on the message (maybe expires after a second?) or just set a maximum queue size equal to your rate per second. Consumers that receive a "normal queue" message must also receive a "token bucket queue" message in order to process the message effectively rate limiting the application.
NodeJS + amqplib Example:
var queueName = 'my_token_bucket';
rabbitChannel.assertQueue(queueName, {durable: true, messageTtl: 1000, maxLength: bucket.ratePerSecond});
writeToken();
function writeToken() {
rabbitChannel.sendToQueue(queueName, new Buffer(new Date().toISOString()), {persistent: true});
setTimeout(writeToken, 1000 / bucket.ratePerSecond);
}
I've already found a solution.
I use module nanotimer from npm for calculation delays.
Then I calculate delay = 1 / [message_per_second] in nanoseconds.
Then I consume message with prefetch = 1
Then I calculate really delay as delay - [processing_message_time]
Then I make timeout = really delay before sending ack for the message
It works perfectly. Thanks to all
See 'Fair Dispatch' in RabbitMQ Documentation.
For example in a situation with two workers, when all odd messages are heavy and even messages are light, one worker will be constantly busy and the other one will do hardly any work. Well, RabbitMQ doesn't know anything about that and will still dispatch messages evenly.
This happens because RabbitMQ just dispatches a message when the message enters the queue. It doesn't look at the number of unacknowledged messages for a consumer. It just blindly dispatches every n-th message to the n-th consumer.
In order to defeat that we can use the prefetch method with the value of 1. This tells RabbitMQ not to give more than one message to a worker at a time. Or, in other words, don't dispatch a new message to a worker until it has processed and acknowledged the previous one. Instead, it will dispatch it to the next worker that is not still busy.
I don't think RabbitMQ can provide you this feature out of the box.
If you have only one consumer, then the whole thing is pretty easy, you just let it sleep between consuming messages.
If you have multiple consumers I would recommend you to use some "shared memory" to keep the rate. For example, you might have 10 consumers consuming messages. To keep 70-200 messages rate across all of them, you will make a call to Redis, to see if you are eligible to process message. If yes, then update Redis, to show other consumers that currently one message is in process.
If you have no control over consumer, then implement option 1 or 2 and publish message back to Rabbit. This way the original consumer will consume messages with the desired pace.
This is how I fixed mine with just settimeout
I set mine to process consume every 200mls which will consume 5 data in 1 seconds I did mine to do update if exist
channel.consume(transactionQueueName, async (data) => {
let dataNew = JSON.parse(data.content);
const processedTransaction = await seperateATransaction(dataNew);
// delay ack to avoid duplicate entry !important dont remove the settimeout
setTimeout(function(){
channel.ack(data);
},200);
});
Done
My application (.NET-based) gets messages from a queue in a multithreaded fashion and I'm worried about the fact that I may receive messages in an out-of-order manner because one thread can be quicker than the other, for instance, given the following queue state:
[Message-5 | Message-4 | Message-3 | Message-2 | Message-1]
In a multithreaded operation, msg #2 may arrive before msg #1, even though msg #1 was first in the queue, due to many threading issues (thread time slices, thread scheduling etc).
In such a situation, it would be great if a message that is inside the queue have already stamped with an ordinal/sequence number when it was enqueued and even if I get the messages in an out of order fashion, I can still order them at some point within my application using their given ordinal-number attribute.
Any known mechanism to achieve it in a Websphere MQ environment?
You have 2 choices:
(1) Use Message Grouping in MQ as whitfiea mentioned or
(2) Change you application to be single threaded.
Note: If the sending application does not set the MQMD MsgId field then the queue manager will generate a unique number (based on queue manager name, date & time) and store it in the message's MQMD MsgID field.
You can obtain the MessageSequenceNumber from the MQMessage if the messages are put to the queue in a message group. The MessageSquenceNumber will either be the order that the messages were put to the queue by default or defined by the application that put the messages to the queue.
See the MessageSequenceNumber here for more details
Yes, if the originating message has an ordinal then as you receive your data you could:
Use a thread safe dictionary:
SortedDictionary<int,Message>
I'm using the Azure Service Bus SubscriptionClient.OnMessage method; configured to process up to 5 messages concurrently.
Within the code I need to wait for all messages to finish processing before I can continue (to properly shutdown an Azure Worker Role). How do I do this?
Will SubscriptionClient.Close() block until all messages have finished processing?
Calling Close on SubscriptionClient or QueueClient will not block. Calling Close closes off the entity immediately as far as I can tell. I tested quickly just using the Worker Role With Service Bus Queue project template that shipped with Windows Azure SDK 2.0. I added a thread sleep for many seconds in the message process action and then shut down the role while it was running. I saw the Close method get called while the messages were processing in their thread sleep but it certainly did not wait for the for message processing to complete, the role simple closed down.
To handle this gracefully you'll need to do the same thing we did when dealing with any worker role that was processing messages (Service Bus, Azure Storage queue or anything else): keep track of what is being worked on and shut down when it is complete. There are several ways to deal with that but all of them are manual and made messy in this case because of the multiple threads involved.
Given the way that OnMessage works you'll need to add something in the action that looks to see if the role has been told to shutdown, and if so, to not do any processing. The problem is, when the OnMessage action is executed it HAS a message already. You'd probably need to abandon the message but not exit the OnMessage action, otherwise it will keep getting a message if there are ones in the queue. You can't simply abandon the message and let the execution leave the action because then the system will be handed another message (possibly the same one) and several threads doing this may cause messages to get too many dequeue counts and get dead lettered. Also, you can't call Close on the SubscriptionClient or QueueClient, which would stop the receive loop internally, because once you call close any of the outstanding message processing will throw an exception when .Complete, .Abandon, etc. is called on the message because the message entity is now closed. This means you can't stop the incoming messages easily.
The main issue here is because you are using the OnMessage and setting up the concurrent message handling by setting the MaxConcurrentCalls on the OnMessageOptions, which means the code that starts and manages the threads is buried in the QueueClient and SubscriptionClient and you don't have control over that. You don't have a way to reduce the count of threads, or stop the threads individually, etc. You'll need to create a way to put the OnMessage action threads into a state where they are aware that the system is being told to shut down and then complete their message and not exit the action in order for them to not continuously be assigned new messages. This means you'll likely need to also set the MessageOptions to not use autocomplete and manually call complete in your OnMessage action.
Having to do all of this may severely reduce the actual benefit of using the OnMessage helper. Behind the scenes OnMessage is simply setting up a loop calling receive with the default timeout and handing of messages to another thread to do the action (loose description). So what you get by using the OnMessage approach is away from having to write that handler on your own, but then the problem you are having is because you didn't write that handler on your own you don't have control over those threads. Catch-22. If you really need to stop gracefully you may want to step away from the OnMessage approach, write your own Receive loop with threading and within the main loop stop receiving new messages and wait for all the workers to end.
One option, especially if the messages are idempotent (which means processing them more than once yields the same results... which you should be mindful of anyway) then if they are stopped in mid processing they will simply reappear on the queue to be processed by another instance later. If the work itself isn't resource intensive and the operations are idempotent then this really can be an option. No different than when an instance might fail due to hardware failure or other issues. Sure, it's not graceful or elegant, but it certainly removes all the complexity I've mentioned and is still something that can happen anyway due to other failures.
Note that the OnStop is called when an instance is told to shut down. You've got 5 minutes you can delay this until the fabric just shuts it off, so if your messages take longer than five minutes to process it won't really matter if you attempt to shut down gracefully or not, some will be cut off during processing.
You can tweak OnMessageAsync to wait for processing of messages to complete, and block new messages from beginning to be processed:
Here is the implementation:
_subscriptionClient.OnMessageAsync(async message =>
{
if (_stopRequested)
{
// Block processing of new messages. We want to wait for old messages to complete and exit.
await Task.Delay(_waitForExecutionCompletionTimeout);
}
else
{
try
{
// Track executing messages
_activeTaskCollection[message.MessageId] = message;
await messageHandler(message);
await message.CompleteAsync();
}
catch (Exception e)
{
// handle error by disposing or doing nothing to force a retry
}
finally
{
BrokeredMessage savedMessage;
if (!_activeTaskCollection.TryRemove(message.MessageId, out savedMessage))
{
_logger.LogWarning("Attempt to remove message id {0} failed.", savedMessage.MessageId);
}
}
}
}, onMessageOptions);
And an implementation of Stop that waits for completion:
public async Task Stop()
{
_stopRequested = true;
DateTime startWaitTime = DateTime.UtcNow;
while (DateTime.UtcNow - startWaitTime < _waitForExecutionCompletionTimeout && _activeTaskCollection.Count > 0)
{
await Task.Delay(_waitForExecutionCompletionSleepBetweenIterations);
}
await _subscriptionClient.CloseAsync();
}
Note that _activeTaskCollection is a ConcurrentDictionary (we can also use a counter with interlock to count the number of in progress messages, but using a dictionary allows you to investigate what happend easily in case of errors.
I'm considering a multi-threaded architecture for a processing pipeline. My main processing module has an input queue, from which it receives data packets. It then performs transformations on these packets (decryption, etc.) and places them into an output queue.
The threading comes in where many input packets can have their contents transformed independently from one another.
However, the punchline is that the output queue must have the same ordering as the input queue (i.e., the first pulled off the input queue must be the first pushed onto the output queue, regardless of whether its transformations finished first.)
Naturally, there will be some kind of synchronisation at the output queue, so my question is: what would be the best way of ensuring that this ordering is maintained?
Have a single thread read the input queue, post a placeholder on the output queue, and then hand the item over to a worker thread to process. When the data is ready the worker thread updates the placeholder. When the thread that needs the value from the output queue reads the placeholder it can then block until the associated data is ready.
Because only a single thread reads the input queue, and this thread immediately puts the placeholder on the output queue, the order in the output queue is the same as that in the input. The worker threads can be numerous, and can do the transformations in any order.
On platforms that support futures, they are ideal as the placeholder. On other systems you can use an event, monitor or condition variable.
With the following assumptions
there should be one input queue, one output queue and one working queue
there should be only one input queue
listener
output message should contain a wait
handle and a pointer to worker/output data
there may be an arbitrary number of
worker threads
I would consider the following flow:
Input queue listener does these steps:
extracts input message;
creates output message:
initializes worker data struct
resets the wait handle
enqueues the pointer to the output message into the working queue
enqueues the pointer to the output message into the output queue
Worker thread does the following:
waits on a working queue to
extract a pointer to an output
message from it
processes the message based on the given data and sets the event when done
consumer does the following:
waits on n output queue to
extract a pointer to an output
message from it
waits on a handle until the output data is ready
does something with the data
That's going to be implementation-specific. One general solution is to number the input items and preserve the numbering so you can later sort the output items. This could be done once the output queue is filled, or it could be done as part of filling it. In other words, you could insert them into their proper position and only allow the queue to be read when the next available item is sequential.
edit
I'm going to sketch out a basic scheme, trying to keep it simple by using the appropriate primitives:
Instead of queueing a Packet into the input queue, we create a future value around it and enqueue that into both the input and output queues. In C#, you could write it like this:
var future = new Lazy<Packet>(delegate() { return Process(packet); }, LazyThreadSafetyMode.ExecutionAndPublication);
A thread from the pool of workers dequeues a future from the input queue and executes future.Value, which causes the delegate to run JIT and returns once the delegate is done processing the packet.
One or more consumers dequeues a future from the output queue. Whenever they need the value of the packet, they call future.Value, which returns immediately if a worker thread has already called the delegate.
Simple, but works.
If you are using a windowed-approach (known number of elements), use an array for the output queue. For example if it is media streaming and you discard packages which haven't been processed quickly enough.
Otherwise, use a priority queue (special kind of heap, often implemented based on a fixed size array) for the output items.
You need to add a sequence number or any datum on which you can sort the items to each data packet. A priority queue is a tree like structure which ensures the sequence of items on insert/pop.