I have been experimenting with Kotlin and running blocking CPU tasks on kotlin coroutines. When things are blocking such as big cpu intensive computations we dont really have suspension but rather we need to launch things on different threads and let them run in parallel.
I managed to get the following code working as expected with async + Default dispatcher but wondered if it was gonna work with withContext and it did not.
fun cpuBlockingTasks() = runBlocking {
val time = measureTimeMillis {
val t1 = cpuTask(id = 1, blockTime = 500)
val t2 = cpuTask(id = 2, blockTime = 2000)
println("The answer is ${t1 + t2}")
}
println("Time taken: $time")
}
suspend fun cpuTask(id: Int, blockTime: Long): Int = withContext(Dispatchers.Default) {
println("work $id start ${getThreadName()}")
val res = doSomeCpuIntensiveTask(blockTime)
println("work $id end ${getThreadName()}")
res
}
fun doSomeCpuIntensiveTask(time: Long): Int {
Thread.sleep(time) // to mimick actual thread blocking / cpu work
return 1
}
This code completes in >2500 ms and runs on the same thread sequentially. I was expecting it to kick off the first coroutine in a thread, immediately return to the caller and kick of the second on a different thread but did not work like that. Anyone know why would that be and how it can be fixed without launching async coroutine in the caller function?
This it the output
work 1 start ForkJoinPool.commonPool-worker-5 #coroutine#1
work 1 end ForkJoinPool.commonPool-worker-5 #coroutine#1
work 2 start ForkJoinPool.commonPool-worker-5 #coroutine#1
work 2 end ForkJoinPool.commonPool-worker-5 #coroutine#1
The answer is 2
Time taken: 2523
You are not creating a new coroutine in cpuTask 1 and cpuTask 2. You are just switching context. It can be easily fixed with async:
fun cpuBlockingTasks() = runBlocking {
val time = measureTimeMillis {
val t1 = async { cpuTask(id = 1, blockTime = 500) }
val t2 = async { cpuTask(id = 2, blockTime = 2000) }
println("The answer is ${t1.await() + t2.await()}")
}
println("Time taken: $time") // Time taken: 2026
}
Related
This code blocks, the exceptions that are thrown don't kill the loop
List(1, 2, 3, 4, 5).par.foreach { i =>
println("i = " + i)
if (i == 5) {
println("Sleeping forever")
java.lang.Thread.sleep(Long.MaxValue)
}
throw new IllegalArgumentException("foo")
} must throwA[IllegalArgumentException]
Is there a way to use .par but make it blow up properly?
I think that when using the par libs you should expect the exception to occour once all the threads actually finish and are are joined back into the current thread, I suspect this because by looking at the implementation of the foreach method (cmd clicking on foreach) a method named executeAndWaitResult is used
Here are some other q/s that seem somewhat similar perhaps it helps
interrupt scala parallel collection
How to cancel Future in Scala?
https://immutables.pl/2016/10/08/parallel-futures-and-exceptions/
This seems to work, but verbose though
implicit class PimpedListScala[T](l: List[T]) {
def parForeachThrowOnException(f: T => Unit, sleepMillis: Long): Unit = {
var exception: Option[Throwable] = None
val future = Future(
l.par.foreach(e =>
try {
f(e)
} catch {
case t: Throwable =>
if (exception.isEmpty) exception = Some(t)
}
)
)
while (exception.isEmpty && !future.isCompleted) {
java.lang.Thread.sleep(sleepMillis)
}
exception.foreach(throw _)
}
}
I've also tried this
def parForeachThrowOnException(f: T => Unit): Unit =
Await.result(Future.traverse(l)(a => Future(f(a))), Duration.Inf)
but this works unpredictably. For a live experiment it took a full 2 hours for the first exception thrown to propagate up and kill the application.
In my existing Scala code I replaced Thread.sleep(10000) with ZIO.sleep(Duration.fromScala(10.seconds)) with the understanding that it won't block thread from the thread pool (performance issue). When program runs it does not wait at this line (whereas of course in first case it does). Do I need to add any extra code for ZIO method to work ?
Adding code section from Play+Scala code:
def sendMultipartEmail = Action.async(parse.multipartFormData) { request =>
.....
//inside this controller below method is called
def retryEmailOnFail(pList: ListBuffer[JsObject], content: String) = {
if (!sendAndGetStatus(pList, content)) {
println("<--- email sending failed - retry once after a delay")
ZIO.sleep(Duration.fromScala(10.seconds))
println("<--- retrying email sending after a delay")
finalStatus = finalStatus && sendAndGetStatus(pList, content)
} else {
finalStatus = finalStatus && true
}
}
.....
}
As you said, ZIO.sleep will only suspend the fiber that is running, not the operating system thread.
If you want to start something after sleeping, you should just chain it after the sleep:
// value 42 will only be computed after waiting for 10s
val io = ZIO.sleep(Duration.fromScala(10.seconds)).map(_ => 42)
I have following code:
object KafkaApi {
private implicit val main: ExecutionContextExecutor = ExecutionContext.global
private val workers = ExecutionContext.fromExecutor(Executors.newCachedThreadPool())
def main(args: Array[String]) {
foo.unsafeRunAsync(_ => ())
//foo.unsafeRunSync()
println("Hello")
}
def foo: IO[Unit] =
for {
_ <- IO {
println(Thread.currentThread().getName)
}
_ <- IO.shift(workers)
_ <- IO {
println(Thread.currentThread().getName)
}
_ <- IO {
println(Thread.currentThread().getName)
}
_ <- IO {
println(Thread.currentThread().getName)
}
_ <- IO.shift(main)
_ <- IO {
println(Thread.currentThread().getName)
}
_ <- IO {
println(Thread.currentThread().getName)
}
_ <- IO {
println(Thread.currentThread().getName)
}
} yield ()
}
and the output is:
main
Hello
pool-1-thread-1
pool-1-thread-1
pool-1-thread-1
scala-execution-context-global-14
scala-execution-context-global-14
scala-execution-context-global-14
What is the difference between main and scala-execution-context-global-14?
If these two are different, how to get the main thread back?
Running the code above, why the application never get terminated?
This additional question is too big for a comment so here goes my answer.
The thing is that in JVM all Threads are divided into "normal" and "daemon" threads. The important thing here is that
The Java Virtual Machine exits when the only threads running are all daemon threads.
So if you have any running non-daemon Thread, JVM thinks your application is still working even if it actually does nothing (maybe it is just waiting for some input). The "main" thread is obviously a "normal" thread. Threads created by standard ExecutionContext.global are daemon and thus don't stop your app from quitting when the main thread finishes. Threads created by Java's Executors.newCachedThreadPool are non-daemon and thus keep the application alive. There are several possible solutions:
Don't use other ExecutionContext except for the global i.e. don't use Executors.newCachedThreadPool at all. Depending on your case this might be or not be what you want.
Explicitly shutdown your custom ExecutorService when all its job is done. Be careful here because shutdown doesn't wait for all active tasks to be finished. So the code should become something like
private val pool = Executors.newCachedThreadPool
implicit private val workers = ExecutionContext.fromExecutor(pool)
// do whatever you want with workers
// optionally wait for all the work to be done
pool.shutdown()
Use custom pool that creates daemon threads. For example you could do something like this:
val workers = ExecutionContext.fromExecutor(Executors.newCachedThreadPool(new ThreadFactory {
private val defaultDelegate = Executors.defaultThreadFactory()
override def newThread(r: Runnable): Thread = {
val t = defaultDelegate.newThread(r)
//let the default factory do all the job and just override daemon-flag
t.setDaemon(true)
t
}
}))
IMHO the main trade-off between #2 and #3 is convenience vs correctness. In #3 you don't have to think where all tasks are finished so it is safe to call shutdown which is convenient. The price is that if for some reason you misjudged and your "main" thread quits before all other tasks are finished, you will not know that anything went wrong because daemon threads will be just silently killed. If you go with #2 and do the same mistake either your app will continue to run if you din't call shutdown in that code path, or you will see some warning in the log that the pool was shutdown while there still were some tasks in progress. So if this is just a middle step in a long sequence of processing what for some reason requires custom thread pool I'd probably go with #3; but if this parallel execution is the main behavior I'd go with more explicit #2 way.
assume having a List where results of jobs that are computed distributed are stored.
Now I have a main thread that is waiting for all jobs finished.
I know the size of the List needs to have until all jobs are finished.
What is the most elegant way in scala let the main thread (while(true) loop) sleep and getting it awake when the jobs are finished?
thanks for your answers
EDIT: ok after trying the concept from #Stefan-Kunze without success (guess I didnt got the point...) I give an example with some code:
The first node:
class PingPlugin extends SmasPlugin
{
val messages = new ListBuffer[BaseMessage]()
val sum = 5
def onStop = true
def onStart =
{
log.info("Ping Plugin created!")
true
}
def handleInit(msg: Init)
{
log.info("Init received")
for( a <- 1 to sum)
{
msg.pingTarget ! Ping() // Ping extends BaseMessage
}
// block here until all messages are received
// wait for messages.length == sum
log.info("handleInit - messages received: %d/%d ".format(messages.length, sum))
}
/**
* This method handles incoming Pong messages
* #param msg Pong extends BaseMessage
*/
def handlePong(msg: Pong)
{
log.info("Pong received from: " + msg.sender)
messages += msg
log.info("handlePong - messages received: %d/%d ".format(messages.length, sum))
}
}
a second node:
class PongPlugin extends SmasPlugin
{
def onStop = true
def onStart =
{
log.info("Pong Plugin created!")
true
}
/**
* This method receives Ping messages and send a Pong message back after a random time
* #param msg Ping extends BaseMessage
*/
def handlePing(msg: Ping)
{
log.info("Ping received from: " + msg.sender)
val sleep: Int = math.round(5000 * Random.nextFloat())
log.info("sleep: " + sleep)
Thread.sleep(sleep)
msg.sender ! Pong()
}
}
I guess the solution is possible with futures...
Picking up #jilen 's approach: (this code is assuming your results are of a type result)
//just like lists futures can be yielded
val tasks: Seq[Future[Result]] = for (i <- 1 to results.size) yield future {
//results.size is the number of //results you are expecting
println("Executing task " + i)
Thread.sleep(i * 1000L)
val result = ??? //your code goes here
result
}
//merge all future results into a future of a sequence of results
val aggregated: Future[Seq[Result]] = Future.sequence(tasks)
//awaits for your results to be computed
val squares: Seq[Int] = Await.result(aggregated, Duration.Inf)
println("Squares: " + squares)
It's hard to test the code here, since I don't have the rest of this system, but I'll try. I'm assuming that somewhere underneath all of this is Akka.
First, blocking like this suggests a real design problem. In an actor system, you should send your messages and move on. Your log command should be in handlePong when the correct number of pings have returned. Blocking init hangs the entire actor. You really should never do that.
But, ok, what if you absolutely have to do that? Then a good tool here would be the ask pattern. Something like this (I can't check that this compiles without more of your code):
import akka.pattern.ask
import akka.util.Timeout
import scala.concurrent.duration._
...
implicit val timeout = Timeout(5 seconds)
var pendingPongs = List.empty[Future[Pong]]
for( a <- 1 to sum)
{
// Ask each target for Ping. Append the returned Future to pendingPongs
pendingPongs += msg.pingTarget ? Ping() // Ping extends BaseMessage
}
// pendingPongs is a list of futures. We want a future of a list.
// sequence() does that for us. We then block using Await until the future completes.
val pongs = Await.result(Future.sequence(pendingPongs), 5 seconds)
log.info(s"handlePong - messages received: ${pongs.length}/$sum")
trying to grasp the TPL.
Just for fun I tried to create some Tasks with a random sleep to see how it was processed. I was targeting a fire and forget pattern..
static void Main(string[] args)
{
Console.WriteLine("Demonstrating a successful transaction");
Random d = new Random();
for (int i = 0; i < 10; i++)
{
var sleep = d.Next(100, 2000);
Action<int> succes = (int x) =>
{
Thread.Sleep(x);
Console.WriteLine("sleep={2}, Task={0}, Thread={1}: Begin successful transaction",
Task.CurrentId, Thread.CurrentThread.ManagedThreadId, x);
};
Task t1 = Task.Factory.StartNew(() => succes(sleep));
}
Console.ReadLine();
}
But I don't understand why it outputs all lines to the Console ignoring the Sleep(random)
Can someone explain that to me?
Important:
The TPL default TaskScheduler does not guarantee Thread per Task - one thread can be used for processing several tasks.
Calling Thread.Sleep might impact other tasks performance.
You can construct your task with the TaskCreationOptions.LongRunning hint this way the TaskScheduler will assign a dedicated thread for the task and it will be safe to block on it.
Your code uses the value of i instead of the generated random number. It does not ignore the sleep but rather sleeps between 0 and 10ms each iteration.
Try:
Thread.Sleep(sleep);
The sentence
Task t1 = Task.Factory.StartNew(() => succes(sleep));
Will create the Task and automatically start it, then will iterate again inside the for, without waiting the task to end its process. So when the second task is created and executed, the first one may be finished. I mean you are not waiting for the tasks to end:
You should try
Task t1 = Task.Factory.StartNew(() => succes(sleep));
t1.Wait();