Is there a way to run delayed or scheduled task with GPars? - groovy

I'm building my concurrent application on top of GPars library.
It contains a thread pool under the hood, so I would like to solve all concurrency-related tasks by means of this pool.
I need to run a task with a certain delay (e.g. 30 seconds). Also I want to run some tasks periodically.
Are there any ways to implements these things with GPars?

What about Thread.sleep for delaying and Quartz for scheduling? I know there are the obvious choices but I don't see anything wrong with using them.
What I mean is to mix GPars with a bit of higher order closures e.g.:
#Grab(group='org.codehaus.gpars', module='gpars', version='1.2.1')
def delayDecorator = {closure, delay ->
return {params ->
Thread.sleep (delay)
closure.call (params)
}
}
groovyx.gpars.GParsPool.withPool() {
def closures = [{println it},{println it + 1}], delay = 1000
closures.collect(delayDecorator.rcurry(delay)).eachParallel {it (1)}
}

Related

Why ExecutorService is much faster than Coroutines in this example? [Solved]

Update:
I made 2 silly mistakes!
I submitted only 1 task in the executor service example
I forgot to await for the tasks to finish.
Fixing the test, lead to all 3 examples having around 190-200 ms/op latency.
I created a benchmark comparison using kotlinx-benchmark (uses jmh) to compare coroutines and a threadpool when making a blocking call.
My rational behind such benchmark is
Coroutines will block the underlying thread when making a blocking call.
A Network call is generally blocking ()
In an average service, I need to make a million of network calls.
In such scenario will I get any benefit, if I use coroutines?
The benchmark I create simulates the blocking call using Thread.sleep(10) // 10 ms block and I need to create 1000 of them. I created 3 examples with following results
Dispatchers.io
Used Dispatchers.io, which is the recommended way to handle IO operations.
#Benchmark
fun withCoroutines() {
runBlocking {
val coroutines = (0 until 1000).map {
CoroutineScope(Dispatchers.IO).async {
sleep(10)
}
}
coroutines.joinAll()
}
}
Avg time: 188.418 ms/op
Fixed Threadpool
Dispatcher.IO created 64 threads (the exact number is nondeterministic statically). So I kept 60 threads for a comparable scenario
#Benchmark
fun withExecutorService() {
val executors = Executors.newFixedThreadPool(60)
executors.submit { sleep(10) }
executors.shutdown()
}
Avg time: 0.054 ms/op
Threadpool Dispatcher
Since the results were shocking I decided to use the same threadpool above as the dispatcher as
Executors.newFixedThreadPool(60).asCoroutineDispatcher()
Avg time: 206,260 ms/op
Questions
Why are coroutines performing exceptionally bad here?
With limitedParallelism(10) options coroutines performed much better at 30ms/op. Default number of threads used by IO are 64. Does that mean that coroutine scheduler is causing too many context switches, leading to poor performance. Still the performance is not close to that of threadpools
Am I correct to assume that the network calls are always blocking? Both executor service and coroutines schedules execution over underlying threads while not blocking the main thread, so they are the direct competitors.
Notes:
I am running jmh with
#State(Scope.Benchmark)
#Fork(1)
#Warmup(iterations = 50)
#Measurement(iterations = 5, time = 1000, timeUnit = TimeUnit.MILLISECONDS)
#OutputTimeUnit(TimeUnit.MILLISECONDS)
#BenchmarkMode(Mode.AverageTime)
The code can be found here

Kotlin coroutines multithread dispatcher and thread-safety for local variables

Let's consider this simple code with coroutines
import kotlinx.coroutines.*
import java.util.concurrent.Executors
fun main() {
runBlocking {
launch (Executors.newFixedThreadPool(10).asCoroutineDispatcher()) {
var x = 0
val threads = mutableSetOf<Thread>()
for (i in 0 until 100000) {
x++
threads.add(Thread.currentThread())
yield()
}
println("Result: $x")
println("Threads: $threads")
}
}
}
As far as I understand this is quite legit coroutines code and it actually produces expected results:
Result: 100000
Threads: [Thread[pool-1-thread-1,5,main], Thread[pool-1-thread-2,5,main], Thread[pool-1-thread-3,5,main], Thread[pool-1-thread-4,5,main], Thread[pool-1-thread-5,5,main], Thread[pool-1-thread-6,5,main], Thread[pool-1-thread-7,5,main], Thread[pool-1-thread-8,5,main], Thread[pool-1-thread-9,5,main], Thread[pool-1-thread-10,5,main]]
The question is what makes these modifications of local variables thread-safe (or is it thread-safe?). I understand that this loop is actually executed sequentially but it can change the running thread on every iteration. The changes done from thread in first iteration still should be visible to the thread that picked up this loop on second iteration. Which code does guarantee this visibility? I tried to decompile this code to Java and dig around coroutines implementation with debugger but did not find a clue.
Your question is completely analogous to the realization that the OS can suspend a thread at any point in its execution and reschedule it to another CPU core. That works not because the code in question is "multicore-safe", but because it is a guarantee of the environment that a single thread behaves according to its program-order semantics.
Kotlin's coroutine execution environment likewise guarantees the safety of your sequential code. You are supposed to program to this guarantee without any worry about how it is maintained.
If you want to descend into the details of "how" out of curiosity, the answer becomes "it depends". Every coroutine dispatcher can choose its own mechanism to achieve it.
As an instructive example, we can focus on the specific dispatcher you use in your posted code: JDK's fixedThreadPoolExecutor. You can submit arbitrary tasks to this executor, and it will execute each one of them on a single (arbitrary) thread, but many tasks submitted together will execute in parallel on different threads.
Furthermore, the executor service provides the guarantee that the code leading up to executor.execute(task) happens-before the code within the task, and the code within the task happens-before another thread's observing its completion (future.get(), future.isCompleted(), getting an event from the associated CompletionService).
Kotlin's coroutine dispatcher drives the coroutine through its lifecycle of suspension and resumption by relying on these primitives from the executor service, and thus you get the "sequential execution" guarantee for the entire coroutine. A single task submitted to the executor ends whenever the coroutine suspends, and the dispatcher submits a new task when the coroutine is ready to resume (when the user code calls continuation.resume(result)).

How to check what dispatcher is configured in akka application

I have following entry in conf file. But I'm not sure if this dispatcher setting is being picked up and what's ultimate parallelism value being used
akka{
actor{
default-dispatcher {
type = Dispatcher
executor = "fork-join-executor"
throughput = 3
fork-join-executor {
parallelism-min = 40
parallelism-factor = 10
parallelism-max = 100
}
}
}
}
I've 8 core machine so I expect 80 parallel threads to be in ready state
40min < 80 (8*10 factor) < 100max. I'd like to see what value is akka using for max parallel thread.
I created 45 child actors and in my logs, I'm printing the thread id [application-akka.actor.default-dispatcher-xx] and I don't see more than 20 threads running in parallel.
In order to max-out the parallelism factor, all the actors needs to be processing some messages at the same time. Are you sure this is the case in your application?
Take for example the following code
object Test extends App {
val system = ActorSystem()
(1 to 80).foreach{ _ =>
val ref = system.actorOf(Props[Sleeper])
ref ! "hello"
}
}
class Sleeper extends Actor {
override def receive: Receive = {
case msg =>
//Thread.sleep(60000)
println(msg)
}
}
If you consider your config and 8 cores, you will see a small amount of threads being spawned (4, 5?) as the processing of the messages is too quick for some real parallelism to build up.
On the contrary, if you keep your actors CPU-busy uncommenting the nasty Thread.sleep you will see the number of threads will bump up to 80. However, this will only last 1 minute, after which the threads will be gradually be retired from the pool.
I guess the main trick is: don't think of each actor being run on a separate thread. It's whenever one or more messages appear on an actor's mailbox that the dispatcher awakes and - indeed - dispatches the message processing task to a designated pool.
Assuming you have an ActorSystem instance you can check the values set in its configuration. This is how you could get your hand on the values you've set in the config file:
val system = ActorSystem()
val config = system.settings.config.getConfig("akka.actor.default-dispatcher")
config.getString("type")
config.getString("executor")
config.getString("throughput")
config.getInt("fork-join-executor.parallelism-min")
config.getInt("fork-join-executor.parallelism-max")
config.getDouble("fork-join-executor.parallelism-factor")
I hope this helps. You can also consult this page for more details on specific configuration settings.
Update
I've dug up a bit more in Akka to find out exactly what it uses for your settings. As you might already expect it uses a ForkJoinPool. The parallelism used to build it is given by:
object ThreadPoolConfig {
...
def scaledPoolSize(floor: Int, multiplier: Double, ceiling: Int): Int =
math.min(math.max((Runtime.getRuntime.availableProcessors * multiplier).ceil.toInt, floor), ceiling)
...
}
This function is used at some point to build a ForkJoinExecutorServiceFactory:
new ForkJoinExecutorServiceFactory(
validate(tf),
ThreadPoolConfig.scaledPoolSize(
config.getInt("parallelism-min"),
config.getDouble("parallelism-factor"),
config.getInt("parallelism-max")),
asyncMode)
Anyway, this is the parallelism that will be used to create the ForkJoinPool, which is actually an instance of java.lang.ForkJoinPool. Now we have to ask how many thread does this pool use? The short answer is that it will use the whole capacity (80 threads in our case) only if it needs it.
To illustrate this scenario, I've ran a couple of tests with various uses of Thread.sleep inside the actor. What I've found out is that it can use from somewhere around 10 threads (if no sleep call is made) to around the max 80 threads (if I call sleep for 1 second). The tests were made on a machine with 8 cores.
Summing it up, you will need to check the implementation used by Akka to see exactly how that parallelism is used, this is why I looked into ForkJoinPool. Other than looking at the config file and then inspecting that particular implementation I don't think you can do unfortunately :(
I hope this clarifies the answer - initially I thought you wanted to see how the actor system's dispatcher is configured.

How can I execute multiple tasks in Scala?

I have 50,000 tasks and want to execute them with 10 threads.
In Java I should create Executers.threadPool(10) and pass runnable to is then wait to process all. Scala as I understand especially useful for that task, but I can't find solution in docs.
Scala 2.9.3 and later
THe simplest approach is to use the scala.concurrent.Future class and associated infrastructure. The scala.concurrent.future method asynchronously evaluates the block passed to it and immediately returns a Future[A] representing the asynchronous computation. Futures can be manipulated in a number of non-blocking ways, including mapping, flatMapping, filtering, recovering errors, etc.
For example, here's a sample that creates 10 tasks, where each tasks sleeps an arbitrary amount of time and then returns the square of the value passed to it.
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
val tasks: Seq[Future[Int]] = for (i <- 1 to 10) yield future {
println("Executing task " + i)
Thread.sleep(i * 1000L)
i * i
}
val aggregated: Future[Seq[Int]] = Future.sequence(tasks)
val squares: Seq[Int] = Await.result(aggregated, 15.seconds)
println("Squares: " + squares)
In this example, we first create a sequence of individual asynchronous tasks that, when complete, provide an int. We then use Future.sequence to combine those async tasks in to a single async task -- swapping the position of the Future and the Seq in the type. Finally, we block the current thread for up to 15 seconds while waiting for the result. In the example, we use the global execution context, which is backed by a fork/join thread pool. For non-trivial examples, you probably would want to use an application specific ExecutionContext.
Generally, blocking should be avoided when at all possible. There are other combinators available on the Future class that can help program in an asynchronous style, including onSuccess, onFailure, and onComplete.
Also, consider investigating the Akka library, which provides actor-based concurrency for Scala and Java, and interoperates with scala.concurrent.
Scala 2.9.2 and prior
This simplest approach is to use Scala's Future class, which is a sub-component of the Actors framework. The scala.actors.Futures.future method creates a Future for the block passed to it. You can then use scala.actors.Futures.awaitAll to wait for all tasks to complete.
For example, here's a sample that creates 10 tasks, where each tasks sleeps an arbitrary amount of time and then returns the square of the value passed to it.
import scala.actors.Futures._
val tasks = for (i <- 1 to 10) yield future {
println("Executing task " + i)
Thread.sleep(i * 1000L)
i * i
}
val squares = awaitAll(20000L, tasks: _*)
println("Squares: " + squares)
You want to look at either the Scala actors library or Akka. Akka has cleaner syntax, but either will do the trick.
So it sounds like you need to create a pool of actors that know how to process your tasks. An Actor can basically be any class with a receive method - from the Akka tutorial (http://doc.akkasource.org/tutorial-chat-server-scala):
class MyActor extends Actor {
def receive = {
case "test" => println("received test")
case _ => println("received unknown message")
}}
val myActor = Actor.actorOf[MyActor]
myActor.start
You'll want to create a pool of actor instances and fire your tasks to them as messages. Here's a post on Akka actor pooling that may be helpful: http://vasilrem.com/blog/software-development/flexible-load-balancing-with-akka-in-scala/
In your case, one actor per task may be appropriate (actors are extremely lightweight compared to threads so you can have a LOT of them in a single VM), or you might need some more sophisticated load balancing between them.
EDIT:
Using the example actor above, sending it a message is as easy as this:
myActor ! "test"
The actor will then output "received test" to standard output.
Messages can be of any type, and when combined with Scala's pattern matching, you have a powerful pattern for building flexible concurrent applications.
In general Akka actors will "do the right thing" in terms of thread sharing, and for the OP's needs, I imagine the defaults are fine. But if you need to, you can set the dispatcher the actor should use to one of several types:
* Thread-based
* Event-based
* Work-stealing
* HawtDispatch-based event-driven
It's trivial to set a dispatcher for an actor:
class MyActor extends Actor {
self.dispatcher = Dispatchers.newExecutorBasedEventDrivenDispatcher("thread-pool-dispatch")
.withNewThreadPoolWithBoundedBlockingQueue(100)
.setCorePoolSize(10)
.setMaxPoolSize(10)
.setKeepAliveTimeInMillis(10000)
.build
}
See http://doc.akkasource.org/dispatchers-scala
In this way, you could limit the thread pool size, but again, the original use case could probably be satisfied with 50K Akka actor instances using default dispatchers and it would parallelize nicely.
This really only scratches the surface of what Akka can do. It brings a lot of what Erlang offers to the Scala language. Actors can monitor other actors and restart them, creating self-healing applications. Akka also provides Software Transactional Memory and many other features. It's arguably the "killer app" or "killer framework" for Scala.
If you want to "execute them with 10 threads", then use threads. Scala's actor model, which is usually what people is speaking of when they say Scala is good for concurrency, hides such details so you won't see them.
Using actors, or futures with all you have are simple computations, you'd just create 50000 of them and let them run. You might be faced with issues, but they are of a different nature.
Here's another answer similar to mpilquist's response but without deprecated API and including the thread settings via a custom ExecutionContext:
import java.util.concurrent.Executors
import scala.concurrent.{ExecutionContext, Await, Future}
import scala.concurrent.duration._
val numJobs = 50000
var numThreads = 10
// customize the execution context to use the specified number of threads
implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(numThreads))
// define the tasks
val tasks = for (i <- 1 to numJobs) yield Future {
// do something more fancy here
i
}
// aggregate and wait for final result
val aggregated = Future.sequence(tasks)
val oneToNSum = Await.result(aggregated, 15.seconds).sum

design pattern for concurrent task execution with constraints

I have 3 classes of task (I, D, U) which come in on a queue, tasks of the same class must be processed in order. I want tasks to run as concurrently as possible; however there are some constraints:
U and D cannot run concurrently
U and I cannot run concurrently
I(n) requires U(n) has completed
Q: What design pattern(s) would fit this class of problem?
I have two approaches I am considering:
Approach 1:
Use 1 Thread per task, each with its own queue. Each thread has a synchronized start phase where it checks start conditions, then runs, then a synchronized stop phase. It is easy to see that this will provide good concurrency but I am unsure if it correctly implements my constraints and doesnt deadlock.
D_Thread { ...
while (task = D_Queue.take()) {
synchronized (State) { // start phase
waitForU();
State.setRunning(D, true);
}
run(task); // run phase
synchronized (State) { // stop phase
State.setRunning(D, false)
}
}
}
Approach 2: Alternatively, a single dispatch thread manages execution state, and schedules tasks in a ThreadPool, waiting if necessary for currently scheduled tasks to complete.
The Objective-C Foundation framework includes classes NSOperationQueue and NSOperation that satisfy some of these requirements. NSOperationQueue represents a queue of NSOperations. The queue runs a configurable maximum number of operations concurrently. Operations have a priority and a set of dependencies; all of the operations that an operation depends on must be completed before the queue will start running the operation. The operations are scheduled to run on a dynamically-sized pool of threads.
What you need requires a somewhat smarter version of NSOperationQueue that applies the constraints you have expressed, but NSOperationQueue and company provide an example of how roughly your problem has been solved in a production framework that resembles your second suggested solution of a dispatch thread running tasks on a thread pool.
Actually this turns out to be more simple than it seemed: a mutex is mainly all that is needed:
IThread(int k) {
synchronized (u_mutex) {
if (previousUSet.contains(k))) U(k);
}
I(k);
}
DThread(int k) {
synchronized (u_mutex) {
D(k);
previousUSet.add(k);
}
}

Resources