Kotlin coroutines multithread dispatcher and thread-safety for local variables - multithreading

Let's consider this simple code with coroutines
import kotlinx.coroutines.*
import java.util.concurrent.Executors
fun main() {
runBlocking {
launch (Executors.newFixedThreadPool(10).asCoroutineDispatcher()) {
var x = 0
val threads = mutableSetOf<Thread>()
for (i in 0 until 100000) {
x++
threads.add(Thread.currentThread())
yield()
}
println("Result: $x")
println("Threads: $threads")
}
}
}
As far as I understand this is quite legit coroutines code and it actually produces expected results:
Result: 100000
Threads: [Thread[pool-1-thread-1,5,main], Thread[pool-1-thread-2,5,main], Thread[pool-1-thread-3,5,main], Thread[pool-1-thread-4,5,main], Thread[pool-1-thread-5,5,main], Thread[pool-1-thread-6,5,main], Thread[pool-1-thread-7,5,main], Thread[pool-1-thread-8,5,main], Thread[pool-1-thread-9,5,main], Thread[pool-1-thread-10,5,main]]
The question is what makes these modifications of local variables thread-safe (or is it thread-safe?). I understand that this loop is actually executed sequentially but it can change the running thread on every iteration. The changes done from thread in first iteration still should be visible to the thread that picked up this loop on second iteration. Which code does guarantee this visibility? I tried to decompile this code to Java and dig around coroutines implementation with debugger but did not find a clue.

Your question is completely analogous to the realization that the OS can suspend a thread at any point in its execution and reschedule it to another CPU core. That works not because the code in question is "multicore-safe", but because it is a guarantee of the environment that a single thread behaves according to its program-order semantics.
Kotlin's coroutine execution environment likewise guarantees the safety of your sequential code. You are supposed to program to this guarantee without any worry about how it is maintained.
If you want to descend into the details of "how" out of curiosity, the answer becomes "it depends". Every coroutine dispatcher can choose its own mechanism to achieve it.
As an instructive example, we can focus on the specific dispatcher you use in your posted code: JDK's fixedThreadPoolExecutor. You can submit arbitrary tasks to this executor, and it will execute each one of them on a single (arbitrary) thread, but many tasks submitted together will execute in parallel on different threads.
Furthermore, the executor service provides the guarantee that the code leading up to executor.execute(task) happens-before the code within the task, and the code within the task happens-before another thread's observing its completion (future.get(), future.isCompleted(), getting an event from the associated CompletionService).
Kotlin's coroutine dispatcher drives the coroutine through its lifecycle of suspension and resumption by relying on these primitives from the executor service, and thus you get the "sequential execution" guarantee for the entire coroutine. A single task submitted to the executor ends whenever the coroutine suspends, and the dispatcher submits a new task when the coroutine is ready to resume (when the user code calls continuation.resume(result)).

Related

Understanding Threads Swift

I sort of understand threads, correct me if I'm wrong.
Is a single thread allocated to a piece of code until that code has completed?
Are the threads prioritised to whichever piece of code is run first?
What is the difference between main queue and thread?
My most important question:
Can threads run at the same time? If so how can I specify which parts of my code should run at a selected thread?
Let me start this way. Unless you are writing a special kind of application (and you will know if you are), forget about threads. Working with threads is complex and tricky. Use dispatch queues… it's simpler and easier.
Dispatch queues run tasks. Tasks are closures (blocks) or functions. When you need to run a task off the main dispatch queue, you call one of the dispatch_ functions, the primary one being dispatch_async(). When you call dispatch_async(), you need to specify which queue to run the task on. To get a queue, you call one of the dispatch_queue_create() or dispatch_get_, the primary one being dispatch_get_global_queue.
NOTE: Swift 3 changed this from a function model to an object model. The dispatch_ functions are instance methods of DispatchQueue. The dispatch_get_ functions are turned into class methods/properties of DispatchQueue
// Swift 3
DispatchQueue.global(qos: .background).async {
var calculation = arc4random()
}
// Swift 2
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0)) {
var calculation = arc4random()
}
The trouble here is any and all tasks which update the UI must be run on the main thread. This is usually done by calling dispatch_async() on the main queue (dispatch_get_main_queue()).
// Swift 3
DispatchQueue.global(qos: .background).async {
var calculation = arc4random()
DispatchQueue.main.async {
print("\(calculation)")
}
}
// Swift 2
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0)) {
var calculation = arc4random()
dispatch_async(dispatch_get_main_queue()) {
print("\(calculation)")
}
}
The gory details are messy. To keep it simple, dispatch queues manage thread pools. It is up to the dispatch queue to create, run, and eventually dispose of threads. The main queue is a special queue which has only 1 thread. The operating system is tasked with assigning threads to a processor and executing the task running on the thread.
With all that out of the way, now I will answer your questions.
Is a single thread allocated to a piece of code until that code has completed?
A task will run in a single thread.
Are the threads prioritised to whichever piece of code is run first?
Tasks are assigned to a thread. A task will not change which thread it runs on. If a task needs to run in another thread, then it creates a new task and assigns that new task to the other thread.
What is the difference between main queue and thread?
The main queue is a dispatch queue which has 1 thread. This single thread is also known as the main thread.
Can threads run at the same time?
Threads are assigned to execute on processors by the operating system. If your device has multiple processors (they all do now-a-days), then multiple threads are executing at the same time.
If so how can I specify which parts of my code should run at a selected thread?
Break you code into tasks. Dispatch the tasks on a dispatch queue.

Serial Dispatch Queue with Asynchronous Blocks

Is there ever any reason to add blocks to a serial dispatch queue asynchronously as opposed to synchronously?
As I understand it a serial dispatch queue only starts executing the next task in the queue once the preceding task has completed executing. If this is the case, I can't see what you would you gain by submitting some blocks asynchronously - the act of submission may not block the thread (since it returns straight-away), but the task won't be executed until the last task finishes, so it seems to me that you don't really gain anything.
This question has been prompted by the following code - taken from a book chapter on design patterns. To prevent the underlying data array from being modified simultaneously by two separate threads, all modification tasks are added to a serial dispatch queue. But note that returnToPool adds tasks to this queue asynchronously, whereas getFromPool adds its tasks synchronously.
class Pool<T> {
private var data = [T]();
// Create a serial dispath queue
private let arrayQ = dispatch_queue_create("arrayQ", DISPATCH_QUEUE_SERIAL);
private let semaphore:dispatch_semaphore_t;
init(items:[T]) {
data.reserveCapacity(data.count);
for item in items {
data.append(item);
}
semaphore = dispatch_semaphore_create(items.count);
}
func getFromPool() -> T? {
var result:T?;
if (dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER) == 0) {
dispatch_sync(arrayQ, {() in
result = self.data.removeAtIndex(0);
})
}
return result;
}
func returnToPool(item:T) {
dispatch_async(arrayQ, {() in
self.data.append(item);
dispatch_semaphore_signal(self.semaphore);
});
}
}
Because there's no need to make the caller of returnToPool() block. It could perhaps continue on doing other useful work.
The thread which called returnToPool() is presumably not just working with this pool. It presumably has other stuff it could be doing. That stuff could be done simultaneously with the work in the asynchronously-submitted task.
Typical modern computers have multiple CPU cores, so a design like this improves the chances that CPU cores are utilized efficiently and useful work is completed sooner. The question isn't whether tasks submitted to the serial queue operate simultaneously — they can't because of the nature of serial queues — it's whether other work can be done simultaneously.
Yes, there are reasons why you'd add tasks to serial queue asynchronously. It's actually extremely common.
The most common example would be when you're doing something in the background and want to update the UI. You'll often dispatch that UI update asynchronously back to the main queue (which is a serial queue). That way the background thread doesn't have to wait for the main thread to perform its UI update, but rather it can carry on processing in the background.
Another common example is as you've demonstrated, when using a GCD queue to synchronize interaction with some object. If you're dealing with immutable objects, you can dispatch these updates asynchronously to this synchronization queue (i.e. why have the current thread wait, but rather instead let it carry on). You'll do reads synchronously (because you're obviously going to wait until you get the synchronized value back), but writes can be done asynchronously.
(You actually see this latter example frequently implemented with the "reader-writer" pattern and a custom concurrent queue, where reads are performed synchronously on concurrent queue with dispatch_sync, but writes are performed asynchronously with barrier with dispatch_barrier_async. But the idea is equally applicable to serial queues, too.)
The choice of synchronous v asynchronous dispatch has nothing to do with whether the destination queue is serial or concurrent. It's simply a question of whether you have to block the current queue until that other one finishes its task or not.
Regarding your code sample code, that is correct. The getFromPool should dispatch synchronously (because you have to wait for the synchronization queue to actually return the value), but returnToPool can safely dispatch asynchronously. Obviously, I'm wary of seeing code waiting for semaphores if that might be called from the main thread (so make sure you don't call getFromPool from the main thread!), but with that one caveat, this code should achieve the desired purpose, offering reasonably efficient synchronization of this pool object, but with a getFromPool that will block if the pool is empty until something is added to the pool.

Play Framework: thread-pool-executor vs fork-join-executor

Let's say we have a an action below in our controller. At each request performLogin will be called by many users.
def performLogin( ) = {
Async {
// API call to the datasource1
val id = databaseService1.getIdForUser();
// API call to another data source different from above
// This process depends on id returned by the call above
val user = databaseService2.getUserGivenId(id);
// Very CPU intensive task
val token = performProcess(user)
// Very CPU intensive calculations
val hash = encrypt(user)
Future.successful(hash)
}
}
I kind of know what the fork-join-executor does. Basically from the main thread which receives a request, it spans multiple worker threads which in tern will divide the work into few chunks. Eventually main thread will join those result and return from the function.
On the other hand, if I were to choose the thread-pool-executor, my understanding is that a thread is chosen from the thread pool, this selected thread will do the work, then go back to the thread pool to listen to more work to do. So no sub dividing of the task happening here.
In above code parallelism by fork-join executor is not possible in my opinion. Each call to the different methods/functions requires something from the previous step. If I were to choose the fork-join executor for the threading how would that benefit me? How would above code execution differ among fork-join vs thread-pool executor.
Thanks
This isn't parallel code, everything inside of your Async call will run in one thread. In fact, Play! never spawns new threads in response to requests - it's event-based, there is an underlying thread pool that handles whatever work needs to be done.
The executor handles scheduling the work from Akka actors and from most Futures (not those created with Future.successful or Future.failed). In this case, each request will be a separate task that the executor has to schedule onto a thread.
The fork-join-executor replaced the thread-pool-executor because it allows work stealing, which improves efficiency. There is no difference in what can be parallelized with the two executors.

Scala actors, futures and system calls resulting in Thread leaks

I am running a complex software with different actors (scala actors). Some of them have some executions that uses scala futures to avoid locking and keep processing new received messages (simplified code):
def act {
while (true) {
receive {
case (code: String) =>
val codeMatch = future { match_code(code) }
for (c <- codeMatch)
yield callback(code)(JSON.parseJSON(c))
}
}
}
def match_code(code: String) {
val result = s"my_script.sh $code" !!
}
I noticed looking at jvisualvm and Eclipse Debugger that the number of active threads keeps increasing when this system is running. I am afraid I am having some kind of Thread leak, but I can't detect where is the problem.
Here are some screenshots of both finished and live threads (I hided some live threads that are not related to this problem)
Finished Threads
Living threads
Edit 1:
In the above graphs example, I run the system with only 3 actors of different classes: Actor1 sends messages to Actor2 that sends message to Actor3
You are using receive so each actor will use its own thread, and you don't at least in this example provide any way for actors to terminate. So you would expect to have one new thread per actor that was ever started. If that is what you see, then all is working as expected. If you want to have actors cease running, you will have to let them eventually fall out of the while loop or call sys.exit on them or somesuch.
(Also, old-style Scala actors are deprecated in favor of Akka actors in 2.11.)
You also don't (in the code above) have any indication whether the future actually completed. If the futures don't finish, they'll keep tying up threads.

multithreading: how to process data in a vector, while the vector is being populated?

I have a single-threaded linux app which I would like to make parallel. It reads a data file, creates objects, and places them in a vector. Then it calls a compute-intensive method (.5 second+) on each object. I want to call the method in parallel with object creation. While I've looked at qt and tbb, I am open to other options.
I planned to start the thread(s) while the vector was empty. Each one would call makeSolids (below), which has a while loop that would run until interpDone==true and all objects in the vector have been processed. However, I'm a n00b when it comes to threading, and I've been looking for a ready-made solution.
QtConcurrent::map(Iter begin,Iter end,function()) looks very easy, but I can't use it on a vector that's changing in size, can I? And how would I tell it to wait for more data?
I also looked at intel's tbb, but it looked like my main thread would halt if I used parallel_for or parallel_while. That stinks, since their memory manager was recommended (open cascade's mmgt has poor performance when multithreaded).
/**intended to be called by a thread
\param start the first item to get from the vector
\param skip how many to skip over (4 for 4 threads)
*/
void g2m::makeSolids(uint start, uint incr) {
uint curr = start;
while ((!interpDone) || (lineVector.size() > curr)) {
if (lineVector.size() > curr) {
if (lineVector[curr]->isMotion()) {
((canonMotion*)lineVector[curr])->setSolidMode(SWEPT);
((canonMotion*)lineVector[curr])->computeSolid();
}
lineVector[curr]->setDispMode(BEST);
lineVector[curr]->display();
curr += incr;
} else {
uio::sleep(); //wait a little bit for interp
}
}
}
EDIT: To summarize, what's the simplest way to process a vector at the same time that the main thread is populating the vector?
Firstly, to benefit from threading you need to find similarly slow tasks for each thread to do. You said your per-object processing takes .5s+, how long does your file reading / object creation take? It could easily be a tenth or a thousandth of that time, in which case your multithreading approach is going to produce neglegible benefit. If that's the case, (yes, I'll answer your original question soon incase it's not) then think about simultaneously processing multiple objects. Given your processing takes quite a while, the thread creation overhead isn't terribly significant, so you could simply have your main file reading/object creation thread spawn a new thread and direct it at the newly created object. The main thread then continues reading/creating subsequent objects. Once all objects are read/created, and all the processing threads launched, the main thread "joins" (waits for) the worker threads. If this will create too many threads (thousands), then put a limit on how far ahead the main thread is allowed to get: it might read/create 10 objects then join 5, then read/create 10, join 10, read/create 10, join 10 etc. until finished.
Now, if you really want the read/create to be in parallel with the processing, but the processing to be serialised, then you can still use the above approach but join after each object. That's kind of weird if you're designing this with only this approach in mind, but good because you can easily experiment with the object processing parallelism above as well.
Alternatively, you can use a more complex approach that just involves the main thread (that the OS creates when your program starts), and a single worker thread that the main thread must start. They should be coordinated using a mutex (a variable ensuring mutually-exclusive, which means not-concurrent, access to data), and a condition variable which allows the worker thread to efficiently block until the main thread has provided more work. The terms - mutex and condition variable - are the standard terms in the POSIX threading that Linux uses, so should be used in the explanation of the particular libraries you're interested in. Summarily, the worker thread waits until the main read/create thread broadcasts it a wake-up signal indicating another object is ready for processing. You may want to have a counter with index of the last fully created, ready-for-processing object, so the worker thread can maintain it's count of processed objects and move along the ready ones before once again checking the condition variable.
It's hard to tell if you have been thinking about this problem deeply and there is more than you are letting on, or if you are just over thinking it, or if you are just wary of threading.
Reading the file and creating the objects is fast; the one method is slow. The dependency is each consecutive ctor depends on the outcome of the previous ctor - a little odd - but otherwise there are no data integrity issues so there doesn't seem to be anything that needs to be protected by mutexes and such.
Why is this more complicated than something like this (in crude pseudo-code):
while (! eof)
{
readfile;
object O(data);
push_back(O);
pthread_create(...., O, makeSolid);
}
while(x < vector.size())
{
pthread_join();
x++;
}
If you don't want to loop on the joins in your main then spawn off a thread to wait on them by passing a vector of TIDs.
If the number of created objects/threads is insane, use a thread pool. Or put a counter is the creation loop to limit the number of threads that can be created before running ones are joined.
#Caleb: quite -- perhaps I should have emphasized active threads. The GUI thread should always be considered one.

Resources