Play Framework: Async vs Sync performance - multithreading

I have following code:
def sync = Action {
val t0 = System.nanoTime()
Thread.sleep(100)
val t1 = System.nanoTime()
Ok("Elapsed time: " + (t1 - t0) / 1000000.0 + "ms")
}
def async = Action {
val t0 = System.nanoTime()
Async {
Future{
Thread.sleep(100)
val t1 = System.nanoTime()
Ok("Elapsed time: " + (t1 - t0) / 1000000.0 + "ms")
}
}
}
Difference among above code is that sync will sleep on the thread that received request and async will sleep on the separate thread so that thread in charge of receiving a request can keep on receiving requests without blocking. When I profile thread, I see a sudden increase in number of threads created for async requests as expected. However both methods above with 4000 concurrent connection 20 sec ramp result in the same throughput and latency. I expected async to perform better. Why would this be?

The short answer is that both methods are essentially the same.
Actions themselves are always asynchronous (see documentation on handling asynchronous results).
In both cases, the sleep call occurs in the action's thread pool (which is not optimal).
As stated in Understanding Play thread pools:
Play framework is, from the bottom up, an asynchronous web framework. Streams are handled asynchronously using iteratees. Thread pools in Play are tuned to use fewer threads than in traditional web frameworks, since IO in play-core never blocks.
Because of this, if you plan to write blocking IO code, or code that could potentially do a lot of CPU intensive work, you need to know exactly which thread pool is bearing that workload, and you need to tune it accordingly.
For instance, this code fragment uses a separate thread pool:
Future {
// Some blocking or expensive code here
}(Contexts.myExecutionContext)
As additional resources, see this answer and this video for more information on handling asynchronous actions and this and this forum messages for extensive discussions on the subject.

Related

Why ExecutorService is much faster than Coroutines in this example? [Solved]

Update:
I made 2 silly mistakes!
I submitted only 1 task in the executor service example
I forgot to await for the tasks to finish.
Fixing the test, lead to all 3 examples having around 190-200 ms/op latency.
I created a benchmark comparison using kotlinx-benchmark (uses jmh) to compare coroutines and a threadpool when making a blocking call.
My rational behind such benchmark is
Coroutines will block the underlying thread when making a blocking call.
A Network call is generally blocking ()
In an average service, I need to make a million of network calls.
In such scenario will I get any benefit, if I use coroutines?
The benchmark I create simulates the blocking call using Thread.sleep(10) // 10 ms block and I need to create 1000 of them. I created 3 examples with following results
Dispatchers.io
Used Dispatchers.io, which is the recommended way to handle IO operations.
#Benchmark
fun withCoroutines() {
runBlocking {
val coroutines = (0 until 1000).map {
CoroutineScope(Dispatchers.IO).async {
sleep(10)
}
}
coroutines.joinAll()
}
}
Avg time: 188.418 ms/op
Fixed Threadpool
Dispatcher.IO created 64 threads (the exact number is nondeterministic statically). So I kept 60 threads for a comparable scenario
#Benchmark
fun withExecutorService() {
val executors = Executors.newFixedThreadPool(60)
executors.submit { sleep(10) }
executors.shutdown()
}
Avg time: 0.054 ms/op
Threadpool Dispatcher
Since the results were shocking I decided to use the same threadpool above as the dispatcher as
Executors.newFixedThreadPool(60).asCoroutineDispatcher()
Avg time: 206,260 ms/op
Questions
Why are coroutines performing exceptionally bad here?
With limitedParallelism(10) options coroutines performed much better at 30ms/op. Default number of threads used by IO are 64. Does that mean that coroutine scheduler is causing too many context switches, leading to poor performance. Still the performance is not close to that of threadpools
Am I correct to assume that the network calls are always blocking? Both executor service and coroutines schedules execution over underlying threads while not blocking the main thread, so they are the direct competitors.
Notes:
I am running jmh with
#State(Scope.Benchmark)
#Fork(1)
#Warmup(iterations = 50)
#Measurement(iterations = 5, time = 1000, timeUnit = TimeUnit.MILLISECONDS)
#OutputTimeUnit(TimeUnit.MILLISECONDS)
#BenchmarkMode(Mode.AverageTime)
The code can be found here

what would be the right way to go for my scenario, thread array, thread pool or tasks?

I am working on a small microfinance application that processes financial transactions, the frequency of these transaction are quite high, which is why I am planning to make it a multi-threaded application that can process multiple transactions in parallel.
I have already designed all the workers that are thread safe,
what I need help for is how to manage these threads. here are some of my options
1.make a specified number of thread pool threads at startup and keep them running like in a infinite loop where they could keep looking for new transactions and if any are found start processing
example code:
void Start_Job(){
for (int l_ThreadId = 0; l_ThreadId < PaymentNoOfWorkerThread; l_ThreadId++)
{
ThreadPool.QueueUserWorkItem(Execute, (object)l_TrackingId);
}
}
void Execute(object l_TrackingId)
{
while(true)
{
var new_txns = Get_New_Txns(); //get new txns if any returns a queue
while(new_txns.count > 0 ){
process_txn(new_txns.Dequeue())
}
Thread.Sleep(some_time);
}
}
2.look for new transactions and assign a thread pool thread for each transaction (my understanding that these threads would be reused after their execution is complete for new txns)
example code:
void Start_Job(){
while(true){
var new_txns = Get_New_Txns(); //get new txns if any returns a queue
for (int l_ThreadId = 0; l_ThreadId < new_txns.count; l_ThreadId++)
{
ThreadPool.QueueUserWorkItem(Execute, (object)new_txn.Dequeue());
}
}
Thread.Sleep(some_time);
}
void Execute(object Txn)
{
process_txn(txn);
}
3.do the above but with tasks.
which option would be most efficient and well suited for my application,
thanks in advance :)
ThreadPool.QueueUserWorkItem is an older API and you shouldn't be using it directly
anymore. Tasks is the way to go and Thread pool is managed automatically for you.
What may suite your application would depend on what happens in process_txn and is subjective, so this is very generic guideline:
If process_txn is a compute bound operation: for example it performs only CPU bound calculations, then you may look at the Task Parallel Library. It will help you use the CPU cores more efficiently.
If process_txn is less of CPU and more IO bound operations: meaning if it may read/write from files/database or connects to some other remote service, then what you should look at is asynchronous programming and make sure your IO operations are all asynchronous which means your threads are never blocked on IO. This will help your service to be more scalable. Also depending on what your queue is, see if you can await on the queue asynchronously, so that none of your application threads are blocked just waiting on the queue.

Can one thread block complete ForkJoinPool

I was reading https://dzone.com/articles/think-twice-using-java-8
Somewhere in between it states that
The problem is that all parallel streams use common fork-join thread pool, and if you submit a long-running task, you effectively block all threads in the pool.
My question is - shouldn't other threads in pool complete without waiting on long running task? OR is it talking about if we create two parallel streams parallely?
A Stream operation does not block threads of the pool, it will utilize them. Depending on the workload split, it is possible that all threads are busy processing the Stream operation that was commenced first, so they can not pick up workload for another Stream operation. The article seems to wrongly use the word “block” for this scenario.
It’s worth noting that the Stream API and default implementation is designed for CPU bound task which do not wait for external events (block a thread). If you use it that way, it doesn’t matter which task keeps the threads busy for the overall throughput. But if you are processing different requests concurrently and want some kind of fairness in worker thread assignment, it won’t work.
If you read on in the article you see that they created an example assuming a wrong use of the Stream API, with truly blocking operations, and even call the first example broken, though they are putting it in quotes unnecessarily. In that case, the error is not using a parallel Stream but using it for blocking operations.
It’s also not correct that such a parallel Stream operation can “block all other tasks that are using parallel streams”. To have another parallel Stream operation, you must have at least one runnable thread initiating the Stream operation. Since this initiating thread will contribute to the Stream processing, there’s always at least one participating thread. So if all threads of the common pool work on one Stream operation, it may degrade the performance of other parallel Stream operations, but not bring them to halt.
E.g., if you use the following test program
long t0 = System.nanoTime();
new Thread(() -> {
Stream.generate(() -> {
long missing = TimeUnit.SECONDS.toNanos(3) + t0 - System.nanoTime();
if(missing > 0) {
System.out.println("blocking "+Thread.currentThread().getName());
LockSupport.parkNanos(missing);
}
return "result";
}).parallel().limit(100).forEach(result -> {});
System.out.println("first (blocking) operation finished");
}).start();
for(int i = 0; i< 4; i++) {
new Thread(() -> {
LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(1));
System.out.println(Thread.currentThread().getName()
+" starting another parallel Stream");
Object[] threads =
Stream.generate(() -> Thread.currentThread().getName())
.parallel().limit(100).distinct().toArray();
System.out.println("finished using "+Arrays.toString(threads));
}).start();
}
it may print something like
blocking ForkJoinPool.commonPool-worker-5
blocking ForkJoinPool.commonPool-worker-13
blocking Thread-0
blocking ForkJoinPool.commonPool-worker-7
blocking ForkJoinPool.commonPool-worker-15
blocking ForkJoinPool.commonPool-worker-11
blocking ForkJoinPool.commonPool-worker-9
blocking ForkJoinPool.commonPool-worker-3
Thread-2 starting another parallel Stream
Thread-4 starting another parallel Stream
Thread-1 starting another parallel Stream
Thread-3 starting another parallel Stream
finished using [Thread-4]
finished using [Thread-2]
finished using [Thread-3]
finished using [Thread-1]
first (blocking) operation finished
(details may vary)
There might be a clash between the thread management that created the initiating threads (those accepting external requests, for example) and the common pool, however. But, as said, parallel Stream operations are not the right tool if you want fairness between a number of independent operations.

Is there a way to run delayed or scheduled task with GPars?

I'm building my concurrent application on top of GPars library.
It contains a thread pool under the hood, so I would like to solve all concurrency-related tasks by means of this pool.
I need to run a task with a certain delay (e.g. 30 seconds). Also I want to run some tasks periodically.
Are there any ways to implements these things with GPars?
What about Thread.sleep for delaying and Quartz for scheduling? I know there are the obvious choices but I don't see anything wrong with using them.
What I mean is to mix GPars with a bit of higher order closures e.g.:
#Grab(group='org.codehaus.gpars', module='gpars', version='1.2.1')
def delayDecorator = {closure, delay ->
return {params ->
Thread.sleep (delay)
closure.call (params)
}
}
groovyx.gpars.GParsPool.withPool() {
def closures = [{println it},{println it + 1}], delay = 1000
closures.collect(delayDecorator.rcurry(delay)).eachParallel {it (1)}
}

Play Framework: thread-pool-executor vs fork-join-executor

Let's say we have a an action below in our controller. At each request performLogin will be called by many users.
def performLogin( ) = {
Async {
// API call to the datasource1
val id = databaseService1.getIdForUser();
// API call to another data source different from above
// This process depends on id returned by the call above
val user = databaseService2.getUserGivenId(id);
// Very CPU intensive task
val token = performProcess(user)
// Very CPU intensive calculations
val hash = encrypt(user)
Future.successful(hash)
}
}
I kind of know what the fork-join-executor does. Basically from the main thread which receives a request, it spans multiple worker threads which in tern will divide the work into few chunks. Eventually main thread will join those result and return from the function.
On the other hand, if I were to choose the thread-pool-executor, my understanding is that a thread is chosen from the thread pool, this selected thread will do the work, then go back to the thread pool to listen to more work to do. So no sub dividing of the task happening here.
In above code parallelism by fork-join executor is not possible in my opinion. Each call to the different methods/functions requires something from the previous step. If I were to choose the fork-join executor for the threading how would that benefit me? How would above code execution differ among fork-join vs thread-pool executor.
Thanks
This isn't parallel code, everything inside of your Async call will run in one thread. In fact, Play! never spawns new threads in response to requests - it's event-based, there is an underlying thread pool that handles whatever work needs to be done.
The executor handles scheduling the work from Akka actors and from most Futures (not those created with Future.successful or Future.failed). In this case, each request will be a separate task that the executor has to schedule onto a thread.
The fork-join-executor replaced the thread-pool-executor because it allows work stealing, which improves efficiency. There is no difference in what can be parallelized with the two executors.

Resources