handle concurrent access in multiple job queues with multiple workers - multithreading

I've to design a job scheduler for multi-tenant app. Each tenant will have it's own job queue for processing background task. There are N workers each of which listen to all the queues and take up the job when idle.
eg.
queue 1 : task - A, B, c
queue 2 : task - D
queue 3 : task - E, F
and I have 3 workers w1, w2, w3, all of which listen to all the queues. This whole design is going to be implemented in aws.
It is important that one job is processed only once. Since all the workers are reading queue's, how can I prevent simultaneous access of 1 job to many workers ?
Also if the workers read all queue sequentially then it will keep dequeuing only from first queue till empty, how to handle this situation ?
I initially thought of using sns ntoification when new task is added to job queue, but since all workers will receive it, the core problem won't be solved.

For the first concern, SQS handles distributing tasks to individual workers automatically, go read about Visibility Timeouts.
If you want to maintain separate queues, you need to put the logic in the workers to do the queue switching, basically putting in an infinite loop that is looping over the 3 queues, checking for new work, and only processing a single chunk / message before switching to the next queue:
while (true)
for (queue : queues) {
message = getMessage(queue)
if (message != null)
processmessage(message)
}
}
Make sure you aren't using long polling, as it will just sit on the first queue.

Related

Using Multiple Asyncio Queues Effectively

I am currently building a project that requires multiple requests made to various endpoints. I am wrapping these requests in Aiohttp to allow for async.
The problem:
I have three Queues: queue1, queue2, and queue3. Additionally, I have three worker functions (worker1, worker2, worker3) that are associated with their respective Queue. The first queue is populated immediately with a list IDs that is known prior to running. When the request is finished and the data is committed to a database, it passes the ID to queue2. A worker2 will take this ID and request more data. From this data it will begin to generate a list of IDs (different from the IDs in queue1/queue2. worker2 will put the IDs in queue3. Finally worker3 will grab this ID from queue3 and request more data before committing to a database.
The issue arises with the fact queue.join() is a blocking call. Each worker is tied to a separate Queue so the join for queue1 will block until its finished. This is fine, but it also defeats the purpose of using async. Without using join() the program is unable to detect when the Queues are totally empty. The other issue is that there may be silent errors when one of the Queues is empty but there is still data that hasn't been added yet.
The basic code outline is as follows:
queue1 = asyncio.Queue()
queue2 = asyncio.Queue()
queue3 = asyncio.Queue()
async with aiohttp.ClientSession() as session:
for i in range(3):
tasks.append(asyncio.create_task(worker1(queue1)))
for i in range(3):
tasks.append(asyncio.create_task(worker2(queue2)))
for i in range(10):
tasks.append(asyncio.create_task(worker3(queue3)))
for i in IDs:
queue1.put_nowait(i)
await asyncio.gather(*tasks)
The worker functions sit in an infinite loop waiting for items to enter the queue.
When the data has all been processed there will be no exit and the program will hang.
Is there a way to effectively manage the workers and end properly?
As nicely explained in this answer, Queue.join serves to inform the producer when all the work injected into the queue got completed. Since your first queue doesn't know when a particular item is done (it's multiplied and distributed to other queues), join is not the right tool for you.
Judging from your code, it seems that your workers need to run for only as long as it takes to process the queue's initial items. If that is the case, then you can use a shutdown sentinel to signal the workers to exit. For example:
async with aiohttp.ClientSession() as session:
# ... create tasks as above ...
for i in IDs:
queue1.put_nowait(i)
queue1.put_nowait(None) # no more work
await asyncio.gather(*tasks)
This is like your original code, but with an explicit shutdown request. Workers must detect the sentinel and react accordingly: propagate it to the next queue/worker and exit. For example, in worker1:
while True:
item = queue1.get()
if item is None:
# done with processing, propagate sentinel to worker2 and exit
await queue2.put(None)
break
# ... process item as usual ...
Doing the same in other two workers (except for worker3 which won't propagate because there's no next queue) will result in all three tasks completing once the work is done. Since queues are FIFO, the workers can safely exit after encountering the sentinel, knowing that no items have been dropped. The explicit shutdown also distinguishes a shut-down queue from one that happens to be empty, thus preventing workers from exiting prematurely due to a temporarily empty queue.
Up to Python 3.7, this technique was actually demonstrated in the documentation of Queue, but that example somewhat confusingly shows both the use of Queue.join and the use of a shutdown sentinel. The two are separate and can be used independently of one another. (And it might also make sense to use them together, e.g. to use Queue.join to wait for a "milestone", and then put other stuff in the queue, while reserving the sentinel for stopping the workers.)

Can `ItemReader` in spring batch wait until the point in time when the data can be available for processing similar like Blocking Queues?

At present I am following the below strategy for processing items in a step.
TaskletStep processingStep = stepBuilderFactory.get(getLabel() + "-" + UUID.randomUUID().toString())
.<Object, Object>chunk(configuration.getChunkSize())
.reader(reader)
.processor(processor)
.writer(writer).transactionManager(txManager).build();
TypedJobParameters typedJobParameters = new TypedJobParameters();
runStep(processingStep, typedJobParameters);
This Task Step does some additional work too like compressing the file and copying it to a different location therefore it took so long time to complete. How can I offload this additional work to background threads.
If background thread keep polling till new file arrives for compression then it may consume more CPU cycles whereas if we can put that thread on wait and notify it when new file arrives then it will become more complex.
How can I start a new TaskStep parallel to my existing above TaskStep in such way that ItemReader of that new TaskStep wait until the point in time when the file arrives for processing like blocking queues?
You can delegate "expensive" work to background thread if you define your processor as an AsyncItemProcessor. You can assign task executor to it with thread pool and delegate processor which will do actual work in background thread.
Item reader will accept other files and will assign them to threads in task executor. When background thread completes processing of file it will be then assigned back to writer.
AsyncProcessor asyncProcessor = new AsyncProcessor();
asyncProcessor.setDelegate(processor);
asyncProcessor.setTaskExecutor(taskExecutor);
AsyncItemWriter asyncItemWriter = new AsyncItemWriter();
asyncItemWriter.setDelegate(writer);
TaskletStep processingStep = stepBuilderFactory.get(getLabel() + "-" + UUID.randomUUID().toString())
.<Object, Object>chunk(configuration.getChunkSize())
.reader(reader)
.processor(asyncProcessor)
.writer(asyncWriter).transactionManager(txManager).build();
TypedJobParameters typedJobParameters = new TypedJobParameters();
runStep(processingStep, typedJobParameters);

What is the strategy by thread assignment in Kafka streams?

I am doing more less such a setup in the code:
// loop over the inTopicName(s) {
KStream<String, String> stringInput = kBuilder.stream( STRING_SERDE, STRING_SERDE, inTopicName );
stringInput.filter( streamFilter::passOrFilterMessages ).map( processor_i ).to( outTopicName );
// } end of loop
streams = new KafkaStreams( kBuilder, streamsConfig );
streams.cleanUp();
streams.start();
If there is e.g. num.stream.threads > 1, how tasks are assigned to the prepared and assigned (in the loop) threads?
I suppose (I am not sure) there is thread pool and with some kind of round-robin policy the tasks are assigned to threads, but it can be done fully dynamically in runtime or once at the beginning by creation of the filtering/mapping to structure.
Especially I am interesting in the situation when one topic is getting computing intensive tasks and other not. Is it possible that application will starve because all threads will be assigned to the processor which is time consuming.
Let's play a bit with scenario: num.stream.threads=2, no. partitions=4 per topic, no. topics=2 (huge_topic and slim_topic)
The loop in my question is done once at startup of the app. If in the loop I define 2 topics, and I know from one topic comes messages which are heavy weighted (huge_topic) and from the other comes lightweighted messsages (slim_topic).
Is it possible that both threads from num.stream.threads will be busy only with tasks which are comming from huge_topic? And messages from slimm_topic will have to wait for processing?
Internally, Kafka Streams create tasks based on partitions. Going with your loop example and assume you have 3 input topics A, B, C with 2, 4, and 3 partition respectively. For this, you will get 4 task (ie, max number of partitions over all topics) with the following partition to task assignment:
t0: A-0, B-0, C-0
t1: A-1, B-1, C-1
t2:        B-2, C-2
t3:        B-3
Partitions are grouped "by number" and assigned to the corresponding task. This is determined at runtime (ie, after you call KafakStreams#start()) because before that, the number of partitions per topic is unknown.
It is not recommended to mess with the partitions grouped if you don't understand all the internal details of Kafka Streams -- you can very easily break stuff! (This interface was deprecated already and will be removed in upcoming 3.0 release.)
With regard to threads: tasks limit the number of threads. For our example, this implies that you can have max 4 thread (if you have more, those threads will be idle, as there is no task left for thread assignment). How you "distribute" those thread is up to you. You can either have 4 single threaded application instances of one single application instance with 4 thread (or anything in between).
If you have fewer tasks than threads, task will be assigned in a load balanced way, based on number of tasks (all tasks are assumed to have the same load).
If there is e.g. num.stream.threads > 1, how tasks are assigned to the
prepared and assigned (in the loop) threads?
Tasks are assigned to threads with the usage of a partition grouper. You can read about it here. AFAIK it's called after a rebalance, so it's not a very dynamic process. That said, I'd argue that there is no option for starvation.

Some questions about Thread Pool in Vert.x?

Vert.x have many thread pool, eventLoopGroup,acceptorEventLoopGroup,internalBlockingPool,workerPool.
Why need so many?
FileSystem read file will use internalBlockingPool, but like this code executeBlocking will use workerPool.
And in this code why resultHandler execute in eventLoop thread not
workpool?
vertx.executeBlocking(future -> {
System.out.println(Thread.currentThread().getName());
future.complete();
}, r -> {
System.out.println(Thread.currentThread().getName());
});
In my understanding eventloop just a single thread is endless loop for channel.If nothing to do with network, no need to use eventLoopGroup.
how to understand event in Vert.x, can give some Vert.x code not netty code?
Event loops: there can be more than one event loop thread. There typically will be more than one event loop thread (it depends on your number of cores). For example,if you start N instances of a verticle, you will want it to spread across multiple cores using multiple event loops. In the docs, look up the multi-reactor pattern.
Vert.x works differently here. Instead of a single event loop, each
Vertx instance maintains several event loops. By default we choose the
number based on the number of available cores on the machine, but this
can be overridden.
http://vertx.io/docs/vertx-core/java/#_reactor_and_multi_reactor
Regarding your question about the result handler: The execute blocking function will run on a worker thread, but once it is all done, it will be pushed over to the event loop thread to finish the result handler. This behavior helps with keeping certain logic on the event loop thread.
Regarding the other thread groups, they just handle specific functionality in vert.x. If you are stressed about the number of threads in vert.x, I would not worry about it. Vert.x does a good job keeping the OS threads to a minimum while maintaining high functionality and throughput.

Spring Integration - Router, task-executor and smart LB

I have a queue channel and a chain with poller and task-executor "listening" on that channel, doing some processing in parallel. What I would like to do is to configure it in such a way that I could route particular messages based on some logic/property to make sure that particular message 'type' is always being process by particular thread from the task-executor.
Example: messages where: PAYLOAD_PROPERTY & 1 == 0 go always to thread 1, PAYLOAD_PROPERTY & 1 == 1 to thread 2 (please notice that this is just an example for 2 threads - I could easily use router here but I can imagine there is logic - like modulo operation - for 10 threads as well) - another words: thread 1 and thread 2 cannot process concurrently same 'type' of message. So the purpose is not just to load balance it - it is to stick with the same thread based on some logic.
My initial thought was to somehow use channel dispatcher (it can have load-balancer-ref and task-executor) but not sure if this would work as I have a chain with poller which do the processing I need further.
Can you advice what is the best component(s) setup to have workflow like above?
There's nothing like that in a "standard" task executor.
It's probably easier to remove the queue channel have a router (subscribed to a direct channel) route to 10 separate executor channels, each configured with a single-thread executor.

Resources