Can one thread block complete ForkJoinPool - multithreading

I was reading https://dzone.com/articles/think-twice-using-java-8
Somewhere in between it states that
The problem is that all parallel streams use common fork-join thread pool, and if you submit a long-running task, you effectively block all threads in the pool.
My question is - shouldn't other threads in pool complete without waiting on long running task? OR is it talking about if we create two parallel streams parallely?

A Stream operation does not block threads of the pool, it will utilize them. Depending on the workload split, it is possible that all threads are busy processing the Stream operation that was commenced first, so they can not pick up workload for another Stream operation. The article seems to wrongly use the word “block” for this scenario.
It’s worth noting that the Stream API and default implementation is designed for CPU bound task which do not wait for external events (block a thread). If you use it that way, it doesn’t matter which task keeps the threads busy for the overall throughput. But if you are processing different requests concurrently and want some kind of fairness in worker thread assignment, it won’t work.
If you read on in the article you see that they created an example assuming a wrong use of the Stream API, with truly blocking operations, and even call the first example broken, though they are putting it in quotes unnecessarily. In that case, the error is not using a parallel Stream but using it for blocking operations.
It’s also not correct that such a parallel Stream operation can “block all other tasks that are using parallel streams”. To have another parallel Stream operation, you must have at least one runnable thread initiating the Stream operation. Since this initiating thread will contribute to the Stream processing, there’s always at least one participating thread. So if all threads of the common pool work on one Stream operation, it may degrade the performance of other parallel Stream operations, but not bring them to halt.
E.g., if you use the following test program
long t0 = System.nanoTime();
new Thread(() -> {
Stream.generate(() -> {
long missing = TimeUnit.SECONDS.toNanos(3) + t0 - System.nanoTime();
if(missing > 0) {
System.out.println("blocking "+Thread.currentThread().getName());
LockSupport.parkNanos(missing);
}
return "result";
}).parallel().limit(100).forEach(result -> {});
System.out.println("first (blocking) operation finished");
}).start();
for(int i = 0; i< 4; i++) {
new Thread(() -> {
LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(1));
System.out.println(Thread.currentThread().getName()
+" starting another parallel Stream");
Object[] threads =
Stream.generate(() -> Thread.currentThread().getName())
.parallel().limit(100).distinct().toArray();
System.out.println("finished using "+Arrays.toString(threads));
}).start();
}
it may print something like
blocking ForkJoinPool.commonPool-worker-5
blocking ForkJoinPool.commonPool-worker-13
blocking Thread-0
blocking ForkJoinPool.commonPool-worker-7
blocking ForkJoinPool.commonPool-worker-15
blocking ForkJoinPool.commonPool-worker-11
blocking ForkJoinPool.commonPool-worker-9
blocking ForkJoinPool.commonPool-worker-3
Thread-2 starting another parallel Stream
Thread-4 starting another parallel Stream
Thread-1 starting another parallel Stream
Thread-3 starting another parallel Stream
finished using [Thread-4]
finished using [Thread-2]
finished using [Thread-3]
finished using [Thread-1]
first (blocking) operation finished
(details may vary)
There might be a clash between the thread management that created the initiating threads (those accepting external requests, for example) and the common pool, however. But, as said, parallel Stream operations are not the right tool if you want fairness between a number of independent operations.

Related

what would be the right way to go for my scenario, thread array, thread pool or tasks?

I am working on a small microfinance application that processes financial transactions, the frequency of these transaction are quite high, which is why I am planning to make it a multi-threaded application that can process multiple transactions in parallel.
I have already designed all the workers that are thread safe,
what I need help for is how to manage these threads. here are some of my options
1.make a specified number of thread pool threads at startup and keep them running like in a infinite loop where they could keep looking for new transactions and if any are found start processing
example code:
void Start_Job(){
for (int l_ThreadId = 0; l_ThreadId < PaymentNoOfWorkerThread; l_ThreadId++)
{
ThreadPool.QueueUserWorkItem(Execute, (object)l_TrackingId);
}
}
void Execute(object l_TrackingId)
{
while(true)
{
var new_txns = Get_New_Txns(); //get new txns if any returns a queue
while(new_txns.count > 0 ){
process_txn(new_txns.Dequeue())
}
Thread.Sleep(some_time);
}
}
2.look for new transactions and assign a thread pool thread for each transaction (my understanding that these threads would be reused after their execution is complete for new txns)
example code:
void Start_Job(){
while(true){
var new_txns = Get_New_Txns(); //get new txns if any returns a queue
for (int l_ThreadId = 0; l_ThreadId < new_txns.count; l_ThreadId++)
{
ThreadPool.QueueUserWorkItem(Execute, (object)new_txn.Dequeue());
}
}
Thread.Sleep(some_time);
}
void Execute(object Txn)
{
process_txn(txn);
}
3.do the above but with tasks.
which option would be most efficient and well suited for my application,
thanks in advance :)
ThreadPool.QueueUserWorkItem is an older API and you shouldn't be using it directly
anymore. Tasks is the way to go and Thread pool is managed automatically for you.
What may suite your application would depend on what happens in process_txn and is subjective, so this is very generic guideline:
If process_txn is a compute bound operation: for example it performs only CPU bound calculations, then you may look at the Task Parallel Library. It will help you use the CPU cores more efficiently.
If process_txn is less of CPU and more IO bound operations: meaning if it may read/write from files/database or connects to some other remote service, then what you should look at is asynchronous programming and make sure your IO operations are all asynchronous which means your threads are never blocked on IO. This will help your service to be more scalable. Also depending on what your queue is, see if you can await on the queue asynchronously, so that none of your application threads are blocked just waiting on the queue.

Serial Dispatch Queue with Asynchronous Blocks

Is there ever any reason to add blocks to a serial dispatch queue asynchronously as opposed to synchronously?
As I understand it a serial dispatch queue only starts executing the next task in the queue once the preceding task has completed executing. If this is the case, I can't see what you would you gain by submitting some blocks asynchronously - the act of submission may not block the thread (since it returns straight-away), but the task won't be executed until the last task finishes, so it seems to me that you don't really gain anything.
This question has been prompted by the following code - taken from a book chapter on design patterns. To prevent the underlying data array from being modified simultaneously by two separate threads, all modification tasks are added to a serial dispatch queue. But note that returnToPool adds tasks to this queue asynchronously, whereas getFromPool adds its tasks synchronously.
class Pool<T> {
private var data = [T]();
// Create a serial dispath queue
private let arrayQ = dispatch_queue_create("arrayQ", DISPATCH_QUEUE_SERIAL);
private let semaphore:dispatch_semaphore_t;
init(items:[T]) {
data.reserveCapacity(data.count);
for item in items {
data.append(item);
}
semaphore = dispatch_semaphore_create(items.count);
}
func getFromPool() -> T? {
var result:T?;
if (dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER) == 0) {
dispatch_sync(arrayQ, {() in
result = self.data.removeAtIndex(0);
})
}
return result;
}
func returnToPool(item:T) {
dispatch_async(arrayQ, {() in
self.data.append(item);
dispatch_semaphore_signal(self.semaphore);
});
}
}
Because there's no need to make the caller of returnToPool() block. It could perhaps continue on doing other useful work.
The thread which called returnToPool() is presumably not just working with this pool. It presumably has other stuff it could be doing. That stuff could be done simultaneously with the work in the asynchronously-submitted task.
Typical modern computers have multiple CPU cores, so a design like this improves the chances that CPU cores are utilized efficiently and useful work is completed sooner. The question isn't whether tasks submitted to the serial queue operate simultaneously — they can't because of the nature of serial queues — it's whether other work can be done simultaneously.
Yes, there are reasons why you'd add tasks to serial queue asynchronously. It's actually extremely common.
The most common example would be when you're doing something in the background and want to update the UI. You'll often dispatch that UI update asynchronously back to the main queue (which is a serial queue). That way the background thread doesn't have to wait for the main thread to perform its UI update, but rather it can carry on processing in the background.
Another common example is as you've demonstrated, when using a GCD queue to synchronize interaction with some object. If you're dealing with immutable objects, you can dispatch these updates asynchronously to this synchronization queue (i.e. why have the current thread wait, but rather instead let it carry on). You'll do reads synchronously (because you're obviously going to wait until you get the synchronized value back), but writes can be done asynchronously.
(You actually see this latter example frequently implemented with the "reader-writer" pattern and a custom concurrent queue, where reads are performed synchronously on concurrent queue with dispatch_sync, but writes are performed asynchronously with barrier with dispatch_barrier_async. But the idea is equally applicable to serial queues, too.)
The choice of synchronous v asynchronous dispatch has nothing to do with whether the destination queue is serial or concurrent. It's simply a question of whether you have to block the current queue until that other one finishes its task or not.
Regarding your code sample code, that is correct. The getFromPool should dispatch synchronously (because you have to wait for the synchronization queue to actually return the value), but returnToPool can safely dispatch asynchronously. Obviously, I'm wary of seeing code waiting for semaphores if that might be called from the main thread (so make sure you don't call getFromPool from the main thread!), but with that one caveat, this code should achieve the desired purpose, offering reasonably efficient synchronization of this pool object, but with a getFromPool that will block if the pool is empty until something is added to the pool.

Node.js multithreading using threads-a-gogo

I am implementing a REST service for financial calculation. So each request is supposed to be a CPU intensive task, and I think that the best place to create threads it's in the following function:
exports.execute = function(data, params, f, callback) {
var queriesList = [];
var resultList = [];
for (var i = 0; i < data.lista.length; i++)
{
var query = (function(cod) {
return function(callbackFlow) {
params.paramcodneg = cod;
doCdaQuery(params, function(err, result)
{
if (err)
{
return callback({ERROR: err}, null);
}
f(data, result, function(ret)
{
resultList.push(ret);
callbackFlow();
});
});
}
})(data.lista[i]);
queriesList.push(query);
}
flow.parallel(queriesList, function() {
callback(null, resultList);
});
};
I don't know what is best, run flow.parallel in a separeted thread or run each function of the queriesList in its own thread. What is best ? And how to use threads-a-gogo module for that ?
I've tried but couldn't write the right code for that.
Thanks in advance.
Kleyson Rios.
I'll admit that I'm relatively new to node.js and I haven't yet used threads a gogo, but I have had some experience with multi-threaded programming, so I'll take a crack at answering this question.
Creating a thread for every single query (I'm assuming these queries are CPU-bound calculations rather than IO-bound calls to a database) is not a good idea. Creating and destroying threads in an expensive operation, so creating an destroying a group of threads for every request that requires calculation is going to be a huge drag on performance. Too many threads will cause more overhead as the processor switches between them. There isn't any advantage to having more worker threads than processor cores.
Also, if each query doesn't take that much processing time, there will be more time spent creating and destroying the thread than running the query. Most of the time would be spent on threading overhead. In this case, you would be much better off using a single-threaded solution using flow or async, which distributes the processing over multiple ticks to allow the node.js event loop to run.
Single-threaded solutions are the easiest to understand and debug, but if the queries are preventing the main thread from getting other stuff done, then a multi-threaded solution is necessary.
The multi-threaded solution you propose is pretty good. Running all the queries in a separate thread prevents the main thread from bogging down. However, there isn't any point in using flow or async in this case. These modules simulate multi-threading by distributing the processing over multiple node.js ticks and tasks run in parallel don't execute in any particular order. However, these tasks still are running in a single thread. Since you're processing the queries in their own thread, and they're no longer interfering with the node.js event loop, then just run them one after another in a loop. Since all the action is happening in a thread without a node.js event loop, using flow or async in just introduces more overhead for no additional benefit.
A more efficient solution is to have a thread pool hanging out in the background and throw tasks at it. The thread pool would ideally have the same number of threads as processor cores, and would be created when the application starts up and destroyed when the application shuts down, so the expensive creating and destroying of threads only happens once. I see that Threads a Gogo has a thread pool that you can use, although I'm afraid I'm not yet familiar enough with it to give you all the details of using it.
I'm drifting into territory I'm not familiar with here, but I believe you could do it by pushing each query individually onto the global thread pool and when all the callbacks have completed, you'll be done.
The Node.flow module would be handy here, not because it would make processing any faster, but because it would help you manage all the query tasks and their callbacks. You would use a loop to push a bunch of parallel tasks on the flow stack using flow.parallel(...), where each task would send a query to the global threadpool using threadpool.any.eval(), and then call ready() in the threadpool callback to tell flow that the task is complete. After the parallel tasks have been queued up, use flow.join() to run all the tasks. That should run the queries on the thread pool, with the thread pool running as many tasks as it can at once, using all the cores and avoiding creating or destroying threads, and all the queries will have been processed.
Other requests would also be tossing their tasks onto the thread pool as well, but you wouldn't notice that because the request being processed would only get callbacks for the tasks that the request gave to the thread pool. Note that this would all be done on the main thread. The thread pool would do all the non-main-thread processing.
You'll need to do some threads a gogo and node.flow documentation reading and figure out some of the details, but that should give you a head start. Using a separate thread is more complex than using the main thread, and making use of a thread pool is even more complex, so you'll have to choose which one is best for you. The extra complexity might or might not be worth it.

Play Framework: thread-pool-executor vs fork-join-executor

Let's say we have a an action below in our controller. At each request performLogin will be called by many users.
def performLogin( ) = {
Async {
// API call to the datasource1
val id = databaseService1.getIdForUser();
// API call to another data source different from above
// This process depends on id returned by the call above
val user = databaseService2.getUserGivenId(id);
// Very CPU intensive task
val token = performProcess(user)
// Very CPU intensive calculations
val hash = encrypt(user)
Future.successful(hash)
}
}
I kind of know what the fork-join-executor does. Basically from the main thread which receives a request, it spans multiple worker threads which in tern will divide the work into few chunks. Eventually main thread will join those result and return from the function.
On the other hand, if I were to choose the thread-pool-executor, my understanding is that a thread is chosen from the thread pool, this selected thread will do the work, then go back to the thread pool to listen to more work to do. So no sub dividing of the task happening here.
In above code parallelism by fork-join executor is not possible in my opinion. Each call to the different methods/functions requires something from the previous step. If I were to choose the fork-join executor for the threading how would that benefit me? How would above code execution differ among fork-join vs thread-pool executor.
Thanks
This isn't parallel code, everything inside of your Async call will run in one thread. In fact, Play! never spawns new threads in response to requests - it's event-based, there is an underlying thread pool that handles whatever work needs to be done.
The executor handles scheduling the work from Akka actors and from most Futures (not those created with Future.successful or Future.failed). In this case, each request will be a separate task that the executor has to schedule onto a thread.
The fork-join-executor replaced the thread-pool-executor because it allows work stealing, which improves efficiency. There is no difference in what can be parallelized with the two executors.

Multi-Producer Single-Consumer Lazy Task Execution

I am trying to model a system where there are multiple threads producing data, and a single thread consuming the data. The trick is that I don't want a dedicated thread to consume the data because all of the threads live in a pool. Instead, I want one of the producers to empty the queue when there is work, and yield if another producer is already clearing the queue.
The basic idea is that there is a queue of work, and a lock around the processing. Each producer pushes its payload onto the queue, and then attempts to enter the lock. The attempt is non-blocking and returns either true (the lock was acquired), or false (the lock is held by someone else).
If the lock is acquired, then that thread then processes all of the data in the queue until it is empty (including any new payloads introduced by other producers during processing). Once all of the work has been processed, the thread releases the lock and quits out.
The following is C++ code for the algorithm:
void Process(ITask *task) {
// queue is a thread safe implementation of a regular queue
queue.push(task);
// crit_sec is some handle to a critical section like object
// try_scoped_lock uses RAII to attempt to acquire the lock in the constructor
// if the lock was acquired, it will release the lock in the
// destructor
try_scoped_lock lock(crit_sec);
// See if this thread won the lottery. Prize is doing all of the dishes
if (!lock.Acquired())
return;
// This thread got the lock, so it needs to do the work
ITask *currTask;
while (queue.try_pop(currTask)) {
... execute task ...
}
}
In general this code works fine, and I have never actually witnessed the behavior I am about to describe below, but that implementation makes me feel uneasy. It stands to reason that a race condition is introduced between when the thread exits the while loop and when it releases the critical section.
The whole algorithm relies on the assumption that if the lock is being held, then a thread is servicing the queue.
I am essentially looking for enlightenment on 2 questions:
Am I correct that there is a race condition as described (bonus for other races)
Is there a standard pattern for implementing this mechanism that is performant and doesn't introduce race conditions?
Yes, there is a race condition.
Thread A adds a task, gets the lock, processes itself, then asks for a task from the queue. It is rejected.
Thread B at this point adds a task to the queue. It then attempts to get the lock, and fails, because thread A has the lock. Thread B exits.
Thread A then exits, with the queue non-empty, and nobody processing the task on it.
This will be difficult to find, because that window is relatively narrow. To make it more likely to find, after the while loop introduce a "sleep for 10 seconds". In the calling code, insert a task, wait 5 seconds, then insert a second task. After 10 more seconds, check that both insert tasks are finished, and there is still a task to be processed on the queue.
One way to fix this would be to change try_pop to try_pop_or_unlock, and pass in your lock to it. try_pop_or_unlock then atomically checks for an empty queue, and if so unlocks the lock and returns false.
Another approach is to improve the thread pool. Add a counting semaphore based "consume" task launcher to it.
semaphore_bool bTaskActive;
counting_semaphore counter;
when (counter || !bTaskActive)
if (bTaskActive)
return
bTaskActive = true
--counter
launch_task( process_one_off_queue, when_done( [&]{ bTaskActive=false ) );
When the counting semaphore is active, or when poked by the finished consume task, it launches a consume task if there is no consume task active.
But that is just off the top of my head.

Resources