Can `ItemReader` in spring batch wait until the point in time when the data can be available for processing similar like Blocking Queues? - multithreading

At present I am following the below strategy for processing items in a step.
TaskletStep processingStep = stepBuilderFactory.get(getLabel() + "-" + UUID.randomUUID().toString())
.<Object, Object>chunk(configuration.getChunkSize())
.reader(reader)
.processor(processor)
.writer(writer).transactionManager(txManager).build();
TypedJobParameters typedJobParameters = new TypedJobParameters();
runStep(processingStep, typedJobParameters);
This Task Step does some additional work too like compressing the file and copying it to a different location therefore it took so long time to complete. How can I offload this additional work to background threads.
If background thread keep polling till new file arrives for compression then it may consume more CPU cycles whereas if we can put that thread on wait and notify it when new file arrives then it will become more complex.
How can I start a new TaskStep parallel to my existing above TaskStep in such way that ItemReader of that new TaskStep wait until the point in time when the file arrives for processing like blocking queues?

You can delegate "expensive" work to background thread if you define your processor as an AsyncItemProcessor. You can assign task executor to it with thread pool and delegate processor which will do actual work in background thread.
Item reader will accept other files and will assign them to threads in task executor. When background thread completes processing of file it will be then assigned back to writer.
AsyncProcessor asyncProcessor = new AsyncProcessor();
asyncProcessor.setDelegate(processor);
asyncProcessor.setTaskExecutor(taskExecutor);
AsyncItemWriter asyncItemWriter = new AsyncItemWriter();
asyncItemWriter.setDelegate(writer);
TaskletStep processingStep = stepBuilderFactory.get(getLabel() + "-" + UUID.randomUUID().toString())
.<Object, Object>chunk(configuration.getChunkSize())
.reader(reader)
.processor(asyncProcessor)
.writer(asyncWriter).transactionManager(txManager).build();
TypedJobParameters typedJobParameters = new TypedJobParameters();
runStep(processingStep, typedJobParameters);

Related

handle concurrent access in multiple job queues with multiple workers

I've to design a job scheduler for multi-tenant app. Each tenant will have it's own job queue for processing background task. There are N workers each of which listen to all the queues and take up the job when idle.
eg.
queue 1 : task - A, B, c
queue 2 : task - D
queue 3 : task - E, F
and I have 3 workers w1, w2, w3, all of which listen to all the queues. This whole design is going to be implemented in aws.
It is important that one job is processed only once. Since all the workers are reading queue's, how can I prevent simultaneous access of 1 job to many workers ?
Also if the workers read all queue sequentially then it will keep dequeuing only from first queue till empty, how to handle this situation ?
I initially thought of using sns ntoification when new task is added to job queue, but since all workers will receive it, the core problem won't be solved.
For the first concern, SQS handles distributing tasks to individual workers automatically, go read about Visibility Timeouts.
If you want to maintain separate queues, you need to put the logic in the workers to do the queue switching, basically putting in an infinite loop that is looping over the 3 queues, checking for new work, and only processing a single chunk / message before switching to the next queue:
while (true)
for (queue : queues) {
message = getMessage(queue)
if (message != null)
processmessage(message)
}
}
Make sure you aren't using long polling, as it will just sit on the first queue.

In Vulkan (or any other modern graphics API), should fences be waited per queue submission or per frame?

I am trying to set up my renderer in a way that rendering always renders into texture, then I just present any texture I like as long as its format is swapchain compatible. This means that, I need to deal with one graphics queue (I don't have compute yet) that renders the scene, ui etc; one transfer queue that copies the rendered image into swapchain; and one present queue for presenting the swapchain. This is a use-case that I am trying to tackle at the moment but I will be having more use-cases like this (e.g compute queues) as my renderer matures.
Here is a pseudocode on what I am trying to achieve. I added some of my own assumptions here as well:
// wait for fences per frame
waitForFences(fences[currentFrame]);
resetFences(fences[currentFrame]);
// 1. Rendering (queue = Graphics)
commandBuffer.begin();
renderEverything();
commandBuffer.end();
QueueSubmitInfo renderSubmit{};
renderSubmit.commandBuffer = commandBuffer;
// Nothing to wait for
renderSubmit.waitSemaphores = nullptr;
// Signal that rendering is complete
renderSubmit.signalSemaphores = { renderSemaphores[currentFrame] };
// Do not signal the fence yet
queueSubmit(renderSubmit, nullptr);
// 2. Transferring to swapchain (queue = Transfer)
// acquire the image that we want to copy into
// and signal that it is available
swapchain.acquireNextImage(imageAvailableSemaphore[currentFrame]);
commandBuffer.begin();
copyTexture(textureToPresent, swapchain.getAvailableImage());
commandBuffer.end();
QueueSubmitInfo transferSubmit{};
transferSubmit.commandBuffer = commandBuffer;
// Wait for swapchain image to be available
// and rendering to be complete
transferSubmit.waitSemaphores = { renderSemaphores[currentFrame], imageAvailableSemaphore[currentFrame] };
// Signal another semaphore that swapchain
// is ready to be used
transferSubmit.signalSemaphores = { readyForPresenting[currentFrame] };
// Now, signal the fence since this is the end of frame
queueSubmit(transferSubmit, fences[currentFrame]);
// 3. Presenting (queue = Present)
PresentQueueSubmitInfo presentSubmit{};
// Wait until the swapchain is ready to be presented
// Basically, waits until the image is copied to swapchain
presentSubmit.waitSemaphores = { readyForPresenting[currentFrame] };
presentQueueSubmit(presentSubmit);
My understanding is that fences are needed to make sure that the CPU waits until GPU is done submitting the previous command buffer to the queue.
When dealing with multiple queues, is it enough to make the CPU wait only for the frame and synchronize different queues with semaphores (pseudocode above is based on this)? Or should each queue wait for a fence separately?
To get into technical details, what will happen if two command buffers are submitted to the same queue without any semaphores? Pseudocode:
// first submissions
commandBufferOne.begin();
doSomething();
commandBufferOne.end();
SubmitInfo firstSubmit{};
firstSubmit.commandBuffer = commandBufferOne;
queueSubmit(firstSubmit, nullptr);
// second submission
commandBufferTwo.begin();
doSomethingElse();
commandBufferTwo.end();
SubmitInfo secondSubmit{};
secondSubmit.commandBuffer = commandBufferOne;
queueSubmit(secondSubmit, nullptr);
Will the second submission overwrite the first one or will the first FIFO queue be executed before the second one since it was submitted first?
This entire organizational scheme seems dubious.
Even ignoring the fact that the Vulkan specification does not require GPUs to offer separate queues for all of these things, you're spreading a series of operations across asynchronous execution, despite the fact that these operations are inherently sequential. You cannot copy from an image to the swapchain until the image has been rendered, and you cannot present the swapchain image until the copy has completed.
So there is basically no advantage to putting these things into their own queues. Just do all of them on the same queue (with one submit and one vkQueuePresentKHR), using appropriate execution and memory dependencies between the operations. This means there's only one thing to wait on: the single submission.
Plus, submit operations are really expensive; doing two submits instead of one submit containing both pieces of work is only a good thing if the submissions are being done on different CPU threads that can work concurrently. But binary semaphores stop that from working. You cannot submit a batch that waits for semaphore A until you have submitted a batch that signals semaphore A. This means that the batch signaling must either be earlier in the same submit command or must have been submitted in a prior submit command. Which means if you put those submits on different threads, you have to use a mutex or something to ensure that the signaling submit happens-before the waiting submit.1
So you don't get any asynchronous execution of the queue submit operation. So neither the CPU nor the GPU will asynchronously execute any of this.
1: Timeline semaphores don't have this problem.
As for the particulars of your technical question, if operation A is dependent on operation B, and you synchronize with A, you have also synchronized with B. Since your transfer operation is waits on a signal from the graphics queue, waiting on the transfer operation will also wait on graphics commands from before that signal.

Can one thread block complete ForkJoinPool

I was reading https://dzone.com/articles/think-twice-using-java-8
Somewhere in between it states that
The problem is that all parallel streams use common fork-join thread pool, and if you submit a long-running task, you effectively block all threads in the pool.
My question is - shouldn't other threads in pool complete without waiting on long running task? OR is it talking about if we create two parallel streams parallely?
A Stream operation does not block threads of the pool, it will utilize them. Depending on the workload split, it is possible that all threads are busy processing the Stream operation that was commenced first, so they can not pick up workload for another Stream operation. The article seems to wrongly use the word “block” for this scenario.
It’s worth noting that the Stream API and default implementation is designed for CPU bound task which do not wait for external events (block a thread). If you use it that way, it doesn’t matter which task keeps the threads busy for the overall throughput. But if you are processing different requests concurrently and want some kind of fairness in worker thread assignment, it won’t work.
If you read on in the article you see that they created an example assuming a wrong use of the Stream API, with truly blocking operations, and even call the first example broken, though they are putting it in quotes unnecessarily. In that case, the error is not using a parallel Stream but using it for blocking operations.
It’s also not correct that such a parallel Stream operation can “block all other tasks that are using parallel streams”. To have another parallel Stream operation, you must have at least one runnable thread initiating the Stream operation. Since this initiating thread will contribute to the Stream processing, there’s always at least one participating thread. So if all threads of the common pool work on one Stream operation, it may degrade the performance of other parallel Stream operations, but not bring them to halt.
E.g., if you use the following test program
long t0 = System.nanoTime();
new Thread(() -> {
Stream.generate(() -> {
long missing = TimeUnit.SECONDS.toNanos(3) + t0 - System.nanoTime();
if(missing > 0) {
System.out.println("blocking "+Thread.currentThread().getName());
LockSupport.parkNanos(missing);
}
return "result";
}).parallel().limit(100).forEach(result -> {});
System.out.println("first (blocking) operation finished");
}).start();
for(int i = 0; i< 4; i++) {
new Thread(() -> {
LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(1));
System.out.println(Thread.currentThread().getName()
+" starting another parallel Stream");
Object[] threads =
Stream.generate(() -> Thread.currentThread().getName())
.parallel().limit(100).distinct().toArray();
System.out.println("finished using "+Arrays.toString(threads));
}).start();
}
it may print something like
blocking ForkJoinPool.commonPool-worker-5
blocking ForkJoinPool.commonPool-worker-13
blocking Thread-0
blocking ForkJoinPool.commonPool-worker-7
blocking ForkJoinPool.commonPool-worker-15
blocking ForkJoinPool.commonPool-worker-11
blocking ForkJoinPool.commonPool-worker-9
blocking ForkJoinPool.commonPool-worker-3
Thread-2 starting another parallel Stream
Thread-4 starting another parallel Stream
Thread-1 starting another parallel Stream
Thread-3 starting another parallel Stream
finished using [Thread-4]
finished using [Thread-2]
finished using [Thread-3]
finished using [Thread-1]
first (blocking) operation finished
(details may vary)
There might be a clash between the thread management that created the initiating threads (those accepting external requests, for example) and the common pool, however. But, as said, parallel Stream operations are not the right tool if you want fairness between a number of independent operations.

Play Framework: thread-pool-executor vs fork-join-executor

Let's say we have a an action below in our controller. At each request performLogin will be called by many users.
def performLogin( ) = {
Async {
// API call to the datasource1
val id = databaseService1.getIdForUser();
// API call to another data source different from above
// This process depends on id returned by the call above
val user = databaseService2.getUserGivenId(id);
// Very CPU intensive task
val token = performProcess(user)
// Very CPU intensive calculations
val hash = encrypt(user)
Future.successful(hash)
}
}
I kind of know what the fork-join-executor does. Basically from the main thread which receives a request, it spans multiple worker threads which in tern will divide the work into few chunks. Eventually main thread will join those result and return from the function.
On the other hand, if I were to choose the thread-pool-executor, my understanding is that a thread is chosen from the thread pool, this selected thread will do the work, then go back to the thread pool to listen to more work to do. So no sub dividing of the task happening here.
In above code parallelism by fork-join executor is not possible in my opinion. Each call to the different methods/functions requires something from the previous step. If I were to choose the fork-join executor for the threading how would that benefit me? How would above code execution differ among fork-join vs thread-pool executor.
Thanks
This isn't parallel code, everything inside of your Async call will run in one thread. In fact, Play! never spawns new threads in response to requests - it's event-based, there is an underlying thread pool that handles whatever work needs to be done.
The executor handles scheduling the work from Akka actors and from most Futures (not those created with Future.successful or Future.failed). In this case, each request will be a separate task that the executor has to schedule onto a thread.
The fork-join-executor replaced the thread-pool-executor because it allows work stealing, which improves efficiency. There is no difference in what can be parallelized with the two executors.

NSOperationQueue Pause & Resume?

I implemented Thread pooling using NSOperationQueue. In which i set maxConcurrentOperationCount to 25. i.e. concurrently 25 threads are running at a time.
I am uploading chunks to a server by using this NSOperationQueue. So chunks are allocated to the first 25 threads. After the NSOperationQueue is full, I want to pause the chunking reading part, then whenever threads from the queue complete, resume the chunking part to allocate new threads to NSOperationQueue to replace the thread which complete.
My Code:
NSOperationQueue *operationQueue = [NSOperationQueue new];
operationQueue.maxConcurrentOperationCount=5;
NSInvocationOperation *operation = [[NSInvocationOperation alloc] initWithTarget:self selector:#selector(upload:) object:f_objChunkDetails->FileStream];
NSUInteger oprCnt=operationQueue.operationCount;
if(oprCnt >= 5) {
// wait till queue has a free slot
} else {
[operationQueue addOperation:operation];
}
So how to pause and resume is used in NSOperationQueue? How to implement ManualResetEvent in Objective-C?
Don't wait or pause. Instead, move your job creation (and check) into a new method. That method should loop to create jobs up to the available limit and then return. Each job that is created should have a completionBlock added which calls the job creation method.
In this way you are event triggered instead of blocking.
Generally, the completionBlock should change to the main thread before calling the job creation method.

Resources