I have a queue channel and a chain with poller and task-executor "listening" on that channel, doing some processing in parallel. What I would like to do is to configure it in such a way that I could route particular messages based on some logic/property to make sure that particular message 'type' is always being process by particular thread from the task-executor.
Example: messages where: PAYLOAD_PROPERTY & 1 == 0 go always to thread 1, PAYLOAD_PROPERTY & 1 == 1 to thread 2 (please notice that this is just an example for 2 threads - I could easily use router here but I can imagine there is logic - like modulo operation - for 10 threads as well) - another words: thread 1 and thread 2 cannot process concurrently same 'type' of message. So the purpose is not just to load balance it - it is to stick with the same thread based on some logic.
My initial thought was to somehow use channel dispatcher (it can have load-balancer-ref and task-executor) but not sure if this would work as I have a chain with poller which do the processing I need further.
Can you advice what is the best component(s) setup to have workflow like above?
There's nothing like that in a "standard" task executor.
It's probably easier to remove the queue channel have a router (subscribed to a direct channel) route to 10 separate executor channels, each configured with a single-thread executor.
Related
I've to design a job scheduler for multi-tenant app. Each tenant will have it's own job queue for processing background task. There are N workers each of which listen to all the queues and take up the job when idle.
eg.
queue 1 : task - A, B, c
queue 2 : task - D
queue 3 : task - E, F
and I have 3 workers w1, w2, w3, all of which listen to all the queues. This whole design is going to be implemented in aws.
It is important that one job is processed only once. Since all the workers are reading queue's, how can I prevent simultaneous access of 1 job to many workers ?
Also if the workers read all queue sequentially then it will keep dequeuing only from first queue till empty, how to handle this situation ?
I initially thought of using sns ntoification when new task is added to job queue, but since all workers will receive it, the core problem won't be solved.
For the first concern, SQS handles distributing tasks to individual workers automatically, go read about Visibility Timeouts.
If you want to maintain separate queues, you need to put the logic in the workers to do the queue switching, basically putting in an infinite loop that is looping over the 3 queues, checking for new work, and only processing a single chunk / message before switching to the next queue:
while (true)
for (queue : queues) {
message = getMessage(queue)
if (message != null)
processmessage(message)
}
}
Make sure you aren't using long polling, as it will just sit on the first queue.
I have successfully used RecipientListRouter in my program where based on the value I am sending it to multiple channels.
I would like to know -
1. Will this maintain the order of execution, say in the below case we receive an event that will be processed by both channelChkn and channelDeboard. So, first the event will be processed by channelChkn and then channelDeboard
Is it executed in different thread or in the same sender's thread
RecipientListRouter router = new RecipientListRouter();
router.setIgnoreSendFailures(true);
router.setApplySequence(true);
router.addRecipient("channelChkn","headers.get('eventSubType').contains('CHKN')");
router.addRecipient("channelBkd","headers.get('eventSubType').contains('BKD')");
router.addRecipient("channelBrd","headers.get('eventSubType').contains('BRD')");
router.addRecipient("channelDeboard","headers.get('isDeBoarded') == true");
router.setDefaultOutputChannelName(IntegrationContextUtils.NULL_CHANNEL_BEAN_NAME);
LOGGER.info("********************* RecipientListRouter *********************"+router.getRecipients());
return router;
Yes, they will be executed in order, on the calling thread, as long as all of the recipient target channels are synchronous (no queue channels, no executor channels, no publish-subscribe channels configured with an executor).
I am doing more less such a setup in the code:
// loop over the inTopicName(s) {
KStream<String, String> stringInput = kBuilder.stream( STRING_SERDE, STRING_SERDE, inTopicName );
stringInput.filter( streamFilter::passOrFilterMessages ).map( processor_i ).to( outTopicName );
// } end of loop
streams = new KafkaStreams( kBuilder, streamsConfig );
streams.cleanUp();
streams.start();
If there is e.g. num.stream.threads > 1, how tasks are assigned to the prepared and assigned (in the loop) threads?
I suppose (I am not sure) there is thread pool and with some kind of round-robin policy the tasks are assigned to threads, but it can be done fully dynamically in runtime or once at the beginning by creation of the filtering/mapping to structure.
Especially I am interesting in the situation when one topic is getting computing intensive tasks and other not. Is it possible that application will starve because all threads will be assigned to the processor which is time consuming.
Let's play a bit with scenario: num.stream.threads=2, no. partitions=4 per topic, no. topics=2 (huge_topic and slim_topic)
The loop in my question is done once at startup of the app. If in the loop I define 2 topics, and I know from one topic comes messages which are heavy weighted (huge_topic) and from the other comes lightweighted messsages (slim_topic).
Is it possible that both threads from num.stream.threads will be busy only with tasks which are comming from huge_topic? And messages from slimm_topic will have to wait for processing?
Internally, Kafka Streams create tasks based on partitions. Going with your loop example and assume you have 3 input topics A, B, C with 2, 4, and 3 partition respectively. For this, you will get 4 task (ie, max number of partitions over all topics) with the following partition to task assignment:
t0: A-0, B-0, C-0
t1: A-1, B-1, C-1
t2: B-2, C-2
t3: B-3
Partitions are grouped "by number" and assigned to the corresponding task. This is determined at runtime (ie, after you call KafakStreams#start()) because before that, the number of partitions per topic is unknown.
It is not recommended to mess with the partitions grouped if you don't understand all the internal details of Kafka Streams -- you can very easily break stuff! (This interface was deprecated already and will be removed in upcoming 3.0 release.)
With regard to threads: tasks limit the number of threads. For our example, this implies that you can have max 4 thread (if you have more, those threads will be idle, as there is no task left for thread assignment). How you "distribute" those thread is up to you. You can either have 4 single threaded application instances of one single application instance with 4 thread (or anything in between).
If you have fewer tasks than threads, task will be assigned in a load balanced way, based on number of tasks (all tasks are assumed to have the same load).
If there is e.g. num.stream.threads > 1, how tasks are assigned to the
prepared and assigned (in the loop) threads?
Tasks are assigned to threads with the usage of a partition grouper. You can read about it here. AFAIK it's called after a rebalance, so it's not a very dynamic process. That said, I'd argue that there is no option for starvation.
I need to limit the rate of consuming messages from rabbitmq queue.
I have found many suggestions, but most of them offer to use prefetch option. But this option doesn't do what I need. Even if I set prefetch to 1 the rate is about 6000 messages/sec. This is too many for consumer.
I need to limit for example about 70 to 200 messages per second. This means consuming one message every 5-14ms. No simultaneous messages.
I'm using Node.JS with amqp.node library.
Implementing a token bucket might help:
https://en.wikipedia.org/wiki/Token_bucket
You can write a producer that produces to the "token bucket queue" at a fixed rate with a TTL on the message (maybe expires after a second?) or just set a maximum queue size equal to your rate per second. Consumers that receive a "normal queue" message must also receive a "token bucket queue" message in order to process the message effectively rate limiting the application.
NodeJS + amqplib Example:
var queueName = 'my_token_bucket';
rabbitChannel.assertQueue(queueName, {durable: true, messageTtl: 1000, maxLength: bucket.ratePerSecond});
writeToken();
function writeToken() {
rabbitChannel.sendToQueue(queueName, new Buffer(new Date().toISOString()), {persistent: true});
setTimeout(writeToken, 1000 / bucket.ratePerSecond);
}
I've already found a solution.
I use module nanotimer from npm for calculation delays.
Then I calculate delay = 1 / [message_per_second] in nanoseconds.
Then I consume message with prefetch = 1
Then I calculate really delay as delay - [processing_message_time]
Then I make timeout = really delay before sending ack for the message
It works perfectly. Thanks to all
See 'Fair Dispatch' in RabbitMQ Documentation.
For example in a situation with two workers, when all odd messages are heavy and even messages are light, one worker will be constantly busy and the other one will do hardly any work. Well, RabbitMQ doesn't know anything about that and will still dispatch messages evenly.
This happens because RabbitMQ just dispatches a message when the message enters the queue. It doesn't look at the number of unacknowledged messages for a consumer. It just blindly dispatches every n-th message to the n-th consumer.
In order to defeat that we can use the prefetch method with the value of 1. This tells RabbitMQ not to give more than one message to a worker at a time. Or, in other words, don't dispatch a new message to a worker until it has processed and acknowledged the previous one. Instead, it will dispatch it to the next worker that is not still busy.
I don't think RabbitMQ can provide you this feature out of the box.
If you have only one consumer, then the whole thing is pretty easy, you just let it sleep between consuming messages.
If you have multiple consumers I would recommend you to use some "shared memory" to keep the rate. For example, you might have 10 consumers consuming messages. To keep 70-200 messages rate across all of them, you will make a call to Redis, to see if you are eligible to process message. If yes, then update Redis, to show other consumers that currently one message is in process.
If you have no control over consumer, then implement option 1 or 2 and publish message back to Rabbit. This way the original consumer will consume messages with the desired pace.
This is how I fixed mine with just settimeout
I set mine to process consume every 200mls which will consume 5 data in 1 seconds I did mine to do update if exist
channel.consume(transactionQueueName, async (data) => {
let dataNew = JSON.parse(data.content);
const processedTransaction = await seperateATransaction(dataNew);
// delay ack to avoid duplicate entry !important dont remove the settimeout
setTimeout(function(){
channel.ack(data);
},200);
});
Done
I'm writing a win32 library and I need to implement a producer-consumer queue using win32 threads. So far everything is going well, but I'm faced with a dilemma: should I use events or condition variables to signal to the consumer that something's been added? I've seen examples that can use either one. Personally for my queue I need the ability to wait on multiple signals at once (an item pushed signal, and a quit signal). There is only one producer and one consumer.
What are the advantages and disadvantages of each? Given my requirements what would you recommend and why? Thanks!
Usually reading is implemented as:
WaitForSingleObject(evt); // 1
EnterCriticalSection(&cs); // 2
//... fetching data from the queue
LeaveCriticalSection(&cs); // 3
But with condition variables (special kind of events) lines 1 and 2 can be replaced
by single call of SleepConditionVariableCS() that does 1 and 2 actions atomically.
In high volume cases (frequent R/W operations) this will give you some benefit.