when using scala.concurrent.Future, I want to understand the exact threading behaviour during runtime of Await.result(). Does it release the thread that processed the statements before the waiting for the Future result? Or is the thread kept in a waiting state, i.e. it's not released for other workloads?
val foo = Future(1)
// Option 1
val fooResult = Await.result(foo, Duration.Inf)
...
I already read the Scala docs about this topic. Does blocking in that context imply that the thread is not released for Await.result()?
Related
Let's consider this simple code with coroutines
import kotlinx.coroutines.*
import java.util.concurrent.Executors
fun main() {
runBlocking {
launch (Executors.newFixedThreadPool(10).asCoroutineDispatcher()) {
var x = 0
val threads = mutableSetOf<Thread>()
for (i in 0 until 100000) {
x++
threads.add(Thread.currentThread())
yield()
}
println("Result: $x")
println("Threads: $threads")
}
}
}
As far as I understand this is quite legit coroutines code and it actually produces expected results:
Result: 100000
Threads: [Thread[pool-1-thread-1,5,main], Thread[pool-1-thread-2,5,main], Thread[pool-1-thread-3,5,main], Thread[pool-1-thread-4,5,main], Thread[pool-1-thread-5,5,main], Thread[pool-1-thread-6,5,main], Thread[pool-1-thread-7,5,main], Thread[pool-1-thread-8,5,main], Thread[pool-1-thread-9,5,main], Thread[pool-1-thread-10,5,main]]
The question is what makes these modifications of local variables thread-safe (or is it thread-safe?). I understand that this loop is actually executed sequentially but it can change the running thread on every iteration. The changes done from thread in first iteration still should be visible to the thread that picked up this loop on second iteration. Which code does guarantee this visibility? I tried to decompile this code to Java and dig around coroutines implementation with debugger but did not find a clue.
Your question is completely analogous to the realization that the OS can suspend a thread at any point in its execution and reschedule it to another CPU core. That works not because the code in question is "multicore-safe", but because it is a guarantee of the environment that a single thread behaves according to its program-order semantics.
Kotlin's coroutine execution environment likewise guarantees the safety of your sequential code. You are supposed to program to this guarantee without any worry about how it is maintained.
If you want to descend into the details of "how" out of curiosity, the answer becomes "it depends". Every coroutine dispatcher can choose its own mechanism to achieve it.
As an instructive example, we can focus on the specific dispatcher you use in your posted code: JDK's fixedThreadPoolExecutor. You can submit arbitrary tasks to this executor, and it will execute each one of them on a single (arbitrary) thread, but many tasks submitted together will execute in parallel on different threads.
Furthermore, the executor service provides the guarantee that the code leading up to executor.execute(task) happens-before the code within the task, and the code within the task happens-before another thread's observing its completion (future.get(), future.isCompleted(), getting an event from the associated CompletionService).
Kotlin's coroutine dispatcher drives the coroutine through its lifecycle of suspension and resumption by relying on these primitives from the executor service, and thus you get the "sequential execution" guarantee for the entire coroutine. A single task submitted to the executor ends whenever the coroutine suspends, and the dispatcher submits a new task when the coroutine is ready to resume (when the user code calls continuation.resume(result)).
This snippet is excerpted from Monix document.
It's an example that how to enter deadlock in Scala.
import java.util.concurrent.Executors
import scala.concurrent._
implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1))
def addOne(x: Int) = Future(x + 1)
def multiply(x: Int, y: Int) = Future {
val a = addOne(x)
val b = addOne(y)
val result = for (r1 <- a; r2 <- b) yield r1 * r2
// This can dead-lock due to the limited size of our thread-pool!
Await.result(result, Duration.Inf)
}
I understand what the code does, but not about how it executed.
Why it is the line Await.result(result, Duration.Inf) causing the deadlock ? (Yes, I tested it)
Is not that the outermost Future at multiply function occupy all the thread pool(the single one) and thus deadlock (because the addOne future is forever blocked on waiting for thread)?
Is not that the outermost Future at multiply function occupy all the thread pool(the single one) and thus deadlock (because the addOne future is forever blocked on waiting for thread)?
Yes, sort of.
When you call val a = addOne(x), you create a new Future that starts waiting for a thread. However, as you noted, the only thread is currently in use by the outermost Future. That wouldn't be a problem without await, since Futures are able to handle this condition. However, this line:
Await.result(result, Duration.Inf)
causes the outer Future to wait for the result Future, which can't run because the outer Future is still using the only available thread. (And, of course, it also can't run because the a and b Futures can't run, again due to the outer Future.)
Here's a simpler example that also deadlocks without creating so many Futures:
def addTwo(x: Int) = Future {
Await.result(addOne(x + 1), Duration.Inf)
}
First of all I would say this code can simulate deadlock, it’s not guaranteed that it will always be in the deadlock.
What is happening in the above code. We have only a single thread in the thread pool. And as soon as we are calling the multiple function as it’s the future so it should run on a separate thread say we assign the single thread we have in the thread pool to this function.
Now the function addOne also is a future so it will again start running on the same thread, but will not wait for a=addOne to get complete and move to the next line b=addOne hence the same thread which was executing the a=addOne now executing the b=addOne and the value of all will never be calculated and that future is not complete and never going to be complete as we have only one thread, same case with the line b=addOne it control will not wait to complete that future and move to the for loop for is also async in the Scala so it will again not evaluated and move to the last line await and it will be waiting for the infinity amount of time to complete the previous futures.
Necessary and sufficient condition to get into the dead lock.
Mutual Exclusion Condition
Hold and Wait Condition
No-Preemptive Condition
Circular Wait Condition
Here we can see we have only one thread so the processes going to be execute are not mutually exclusive.
once the thread is executing specific block and hence it’s a future and not waiting to complete it, it’s going ahead and executing the next block hence it’s reaching to the await statement and the thread is holding there while all the other future which are not complete are waiting for the thread to complete the future.
Once the thread is allocated to the await it can’t be preempt that’s the reason we can’t execute the remaining future which are not complete.
And circular wait is there because awaits is waiting for the non-complete future to be complete and other futures are waiting for the await call to be complete.
Simply we can say the control will directly reach to the await statement and start waiting for the non-complete futures to got complete which is not going to be happen anyhow. Because we have only one thread in our thread pool.
Await.result(result, Duration.Inf)
When you use await, you are waiting for future to complete. And you have given infinite time. So if anyhow Future will never be able to complete, main thread go to infinite wait.
Let's say we have a an action below in our controller. At each request performLogin will be called by many users.
def performLogin( ) = {
Async {
// API call to the datasource1
val id = databaseService1.getIdForUser();
// API call to another data source different from above
// This process depends on id returned by the call above
val user = databaseService2.getUserGivenId(id);
// Very CPU intensive task
val token = performProcess(user)
// Very CPU intensive calculations
val hash = encrypt(user)
Future.successful(hash)
}
}
I kind of know what the fork-join-executor does. Basically from the main thread which receives a request, it spans multiple worker threads which in tern will divide the work into few chunks. Eventually main thread will join those result and return from the function.
On the other hand, if I were to choose the thread-pool-executor, my understanding is that a thread is chosen from the thread pool, this selected thread will do the work, then go back to the thread pool to listen to more work to do. So no sub dividing of the task happening here.
In above code parallelism by fork-join executor is not possible in my opinion. Each call to the different methods/functions requires something from the previous step. If I were to choose the fork-join executor for the threading how would that benefit me? How would above code execution differ among fork-join vs thread-pool executor.
Thanks
This isn't parallel code, everything inside of your Async call will run in one thread. In fact, Play! never spawns new threads in response to requests - it's event-based, there is an underlying thread pool that handles whatever work needs to be done.
The executor handles scheduling the work from Akka actors and from most Futures (not those created with Future.successful or Future.failed). In this case, each request will be a separate task that the executor has to schedule onto a thread.
The fork-join-executor replaced the thread-pool-executor because it allows work stealing, which improves efficiency. There is no difference in what can be parallelized with the two executors.
I am running a complex software with different actors (scala actors). Some of them have some executions that uses scala futures to avoid locking and keep processing new received messages (simplified code):
def act {
while (true) {
receive {
case (code: String) =>
val codeMatch = future { match_code(code) }
for (c <- codeMatch)
yield callback(code)(JSON.parseJSON(c))
}
}
}
def match_code(code: String) {
val result = s"my_script.sh $code" !!
}
I noticed looking at jvisualvm and Eclipse Debugger that the number of active threads keeps increasing when this system is running. I am afraid I am having some kind of Thread leak, but I can't detect where is the problem.
Here are some screenshots of both finished and live threads (I hided some live threads that are not related to this problem)
Finished Threads
Living threads
Edit 1:
In the above graphs example, I run the system with only 3 actors of different classes: Actor1 sends messages to Actor2 that sends message to Actor3
You are using receive so each actor will use its own thread, and you don't at least in this example provide any way for actors to terminate. So you would expect to have one new thread per actor that was ever started. If that is what you see, then all is working as expected. If you want to have actors cease running, you will have to let them eventually fall out of the while loop or call sys.exit on them or somesuch.
(Also, old-style Scala actors are deprecated in favor of Akka actors in 2.11.)
You also don't (in the code above) have any indication whether the future actually completed. If the futures don't finish, they'll keep tying up threads.
Let's say I'm getting a (potentially big) list of images to download from some URLs. I'm using Scala, so what I would do is :
import scala.actors.Futures._
// Retrieve URLs from somewhere
val urls: List[String] = ...
// Download image (blocking operation)
val fimages: List[Future[...]] = urls.map (url => future { download url })
// Do something (display) when complete
fimages.foreach (_.foreach (display _))
I'm a bit new to Scala, so this still looks a little like magic to me :
Is this the right way to do it? Any alternatives if it is not?
If I have 100 images to download, will this create 100 threads at once, or will it use a thread pool?
Will the last instruction (display _) be executed on the main thread, and if not, how can I make sure it is?
Thanks for your advice!
Use Futures in Scala 2.10. They were joint work between the Scala team, the Akka team, and Twitter to reach a more standardized future API and implementation for use across frameworks. We just published a guide at: http://docs.scala-lang.org/overviews/core/futures.html
Beyond being completely non-blocking (by default, though we provide the ability to do managed blocking operations) and composable, Scala's 2.10 futures come with an implicit thread pool to execute your tasks on, as well as some utilities to manage time outs.
import scala.concurrent.{future, blocking, Future, Await, ExecutionContext.Implicits.global}
import scala.concurrent.duration._
// Retrieve URLs from somewhere
val urls: List[String] = ...
// Download image (blocking operation)
val imagesFuts: List[Future[...]] = urls.map {
url => future { blocking { download url } }
}
// Do something (display) when complete
val futImages: Future[List[...]] = Future.sequence(imagesFuts)
Await.result(futImages, 10 seconds).foreach(display)
Above, we first import a number of things:
future: API for creating a future.
blocking: API for managed blocking.
Future: Future companion object which contains a number of useful methods for collections of futures.
Await: singleton object used for blocking on a future (transferring its result to the current thread).
ExecutionContext.Implicits.global: the default global thread pool, a ForkJoin pool.
duration._: utilities for managing durations for time outs.
imagesFuts remains largely the same as what you originally did- the only difference here is that we use managed blocking- blocking. It notifies the thread pool that the block of code you pass to it contains long-running or blocking operations. This allows the pool to temporarily spawn new workers to make sure that it never happens that all of the workers are blocked. This is done to prevent starvation (locking up the thread pool) in blocking applications. Note that the thread pool also knows when the code in a managed blocking block is complete- so it will remove the spare worker thread at that point, which means that the pool will shrink back down to its expected size.
(If you want to absolutely prevent additional threads from ever being created, then you ought to use an AsyncIO library, such as Java's NIO library.)
Then we use the collection methods of the Future companion object to convert imagesFuts from List[Future[...]] to a Future[List[...]].
The Await object is how we can ensure that display is executed on the calling thread-- Await.result simply forces the current thread to wait until the future that it is passed is completed. (This uses managed blocking internally.)
val all = Future.traverse(urls){ url =>
val f = future(download url) /*(downloadContext)*/
f.onComplete(display)(displayContext)
f
}
Await.result(all, ...)
Use scala.concurrent.Future in 2.10, which is RC now.
which uses an implicit ExecutionContext
The new Future doc is explicit that onComplete (and foreach) may evaluate immediately if the value is available. The old actors Future does the same thing. Depending on what your requirement is for display, you can supply a suitable ExecutionContext (for instance, a single thread executor). If you just want the main thread to wait for loading to complete, traverse gives you a future to await on.
Yes, seems fine to me, but you may want to investigate more powerful twitter-util or Akka Future APIs (Scala 2.10 will have a new Future library in this style).
It uses a thread pool.
No, it won't. You need to use the standard mechanism of your GUI toolkit for this (SwingUtilities.invokeLater for Swing or Display.asyncExec for SWT). E.g.
fimages.foreach (_.foreach(im => SwingUtilities.invokeLater(new Runnable { display im })))