A simple thread pool with a global shared queue of tasks (functors).
Each worker (thread) will pick up one task from the worker, and execute it. It wont execute the next task, until this one is finished.
Lets imagine a big task that needs to spawn child tasks to produce some data, and then continue with evaluation (for example, to sort a big array before save to disk).
pseudo code of the task code:
do some stuff
generate a list of child tasks
threadpool.spawn (child tasks)
wait until they were executed
continue my task
The problem is that the worker will dead lock, because the task is waiting for the child task, and the thread pool is waiting for the parent task to end, before running the child one.
One idea is to run the child task inside the spawn code:
threadpool.spawn pseudo code:
threadpool.push (tasks)
while (not all incoming task were executed)
t = threadpool.pop()
t.run()
return (and continue executing parent task)
but, how can I know that all the task were executed , in an efficient way?
Another idea is to split the parent task.. something like this:
task pseudo code:
l = generate a list of child tasks
threadpool.push ( l , high priority )
t = create a task to work with generated data
threadpool.push (t , lo priority )
But i found this quite intrusive...
any opinions?
pd. merry christmas!
pd2. edited some bad names
You can have a mechanism for the children threads to signal back to the main worker whenever they are done so it can proceed. In Java, Callable tasks submitted to an ExecutorService thread pool respond back with their results as Futures data structures. Another approach would be to maintain a separate completion signal, something similar to a CountDownLatch, which will serve as a common countdown mechanism to be updated every time a thread completes.
Related
Let's consider this simple code with coroutines
import kotlinx.coroutines.*
import java.util.concurrent.Executors
fun main() {
runBlocking {
launch (Executors.newFixedThreadPool(10).asCoroutineDispatcher()) {
var x = 0
val threads = mutableSetOf<Thread>()
for (i in 0 until 100000) {
x++
threads.add(Thread.currentThread())
yield()
}
println("Result: $x")
println("Threads: $threads")
}
}
}
As far as I understand this is quite legit coroutines code and it actually produces expected results:
Result: 100000
Threads: [Thread[pool-1-thread-1,5,main], Thread[pool-1-thread-2,5,main], Thread[pool-1-thread-3,5,main], Thread[pool-1-thread-4,5,main], Thread[pool-1-thread-5,5,main], Thread[pool-1-thread-6,5,main], Thread[pool-1-thread-7,5,main], Thread[pool-1-thread-8,5,main], Thread[pool-1-thread-9,5,main], Thread[pool-1-thread-10,5,main]]
The question is what makes these modifications of local variables thread-safe (or is it thread-safe?). I understand that this loop is actually executed sequentially but it can change the running thread on every iteration. The changes done from thread in first iteration still should be visible to the thread that picked up this loop on second iteration. Which code does guarantee this visibility? I tried to decompile this code to Java and dig around coroutines implementation with debugger but did not find a clue.
Your question is completely analogous to the realization that the OS can suspend a thread at any point in its execution and reschedule it to another CPU core. That works not because the code in question is "multicore-safe", but because it is a guarantee of the environment that a single thread behaves according to its program-order semantics.
Kotlin's coroutine execution environment likewise guarantees the safety of your sequential code. You are supposed to program to this guarantee without any worry about how it is maintained.
If you want to descend into the details of "how" out of curiosity, the answer becomes "it depends". Every coroutine dispatcher can choose its own mechanism to achieve it.
As an instructive example, we can focus on the specific dispatcher you use in your posted code: JDK's fixedThreadPoolExecutor. You can submit arbitrary tasks to this executor, and it will execute each one of them on a single (arbitrary) thread, but many tasks submitted together will execute in parallel on different threads.
Furthermore, the executor service provides the guarantee that the code leading up to executor.execute(task) happens-before the code within the task, and the code within the task happens-before another thread's observing its completion (future.get(), future.isCompleted(), getting an event from the associated CompletionService).
Kotlin's coroutine dispatcher drives the coroutine through its lifecycle of suspension and resumption by relying on these primitives from the executor service, and thus you get the "sequential execution" guarantee for the entire coroutine. A single task submitted to the executor ends whenever the coroutine suspends, and the dispatcher submits a new task when the coroutine is ready to resume (when the user code calls continuation.resume(result)).
I would like to implement a master task controlling several instances of a worker task. Each worker task has three different phases:
Initialization
Do work
Report results
At the beginning the master task should initialize all worker tasks (concurrently). Each worker task has then s seconds to successfully complete its initialization but the completion in s seconds is not guaranteed.
What efficient possibilities (signaling mechanisms) do I have to let the master task monitor the state of the initialization of all worker tasks? I thought to give each worker task access to a worker task specific protected type object with a procedure to set a boolean flag which would be set by the individiual worker tasks after they have successfully completed their initialization.
After the master task has triggered the initialization of all worker tasks it could remember the current time and enter a loop to periodically poll the worker tasks initialization states by using a function declared in the protected object type to retrieve the initialization state. The loop is then exited if all worker tasks have been initialized or s seconds have been passed.
Do I have to use such a polling concept using a delay statement inside the monitor loop using an appropriately time value? I read about timeouts of entry calls. Could I use such timeouts to prevent the polling?
After a worker task has been successfuly completed its initialization it should wait for a signal from the control task to execute one work package. So I think a worker task should have a Do_Work entry and the master task therefore should call these entries for all worker tasks in a loop, right?
The master task could use an appropriate mechanism to check if all worker tasks have been completed their work packages. After this has happened the worker tasks should report their work results but in a deterministic way (not concurrently). So if I use a Report_Result entry in the worker tasks to wait for a signal from the master task the call of this entries in a loop in the control task would lead to a non-deterministic order of the report results. Can these entries also be called in blocking way (like a normal procedure call)?
You are correct that the master task can call the Do_Work entry for each worker task.
Similarly, the master task can call the Report_Result entry of all worker tasks.
A simple way to accomplish this is to create a task type for the worker tasks, and then an array of the worker tasks.
procedure Master is
task type Workers is
entry Do_Work;
entry Report_Result;
end Workers;
Team : array(1..5) of Workers;
begin
-- Initialization will occur automatically
-- Signal workers to Do_Work
for Worker of Team loop
Worker.Do_Work;
end loop;
-- Create a loop to signal all reports
-- While the workers may finish in a random order, the
-- reporting will occur in the order of the array indices
for Worker of Team loop
Worker.Report_Result;
end loop;
end Master;
This example is incomplete because it does not define the task body for the Workers task type. The important features of this program are:
Task initialization of the workers in the Team array begins when execution reaches the begin statement in Master.
The Master will wait for each element of Team to accept the entry call to Do_Work.
Each element of Team will wait at the accept statement for Master to call the Do_Work entry.
The master will wait for each element of Team to accept the Report_Result entry.
Each element of Team will wait at its accept for Report_Result for the master to call that entry.
The Ada Rendezvous mechanism neatly coordinates all communication between master and each of the workers.
One thing you can do if you really want the workers to signal the manager that they are done, is pass the Manager's access to the workers and provide an entry for them to call. You have to decide how the manager and workers interact when that signal happens.
As an example, I had a manager keep an array of Workers and two lists of accesses to those workers (since they are limited types, you have to use access variables). One list would keep track of all available workers and the other would keep track of the workers currently doing something. As workers finish up their work, they signal the manager who removed them from the busy list and puts them in the available list. When the client requests that the manager do more work, it pulls a worker from the available list and places it on the busy list and starts the worker going. Here is an example compiled in GNAT 7.1.1:
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Containers.Bounded_Doubly_Linked_Lists;
procedure Hello is
package Tasks is
type Worker;
type Worker_Access is access all Worker;
package Lists is new Ada.Containers.Bounded_Doubly_Linked_Lists
(Element_Type => Worker_Access);
task type Manager is
-- Called by client code
entry Add_Work;
entry Stop;
-- Only called by workers to signal they are
-- finished
entry Signal(The_Position : in out Lists.Cursor);
end Manager;
task type Worker(Boss : not null access Manager) is
entry Start(The_Position : Lists.Cursor);
end Worker;
end Tasks;
package body Tasks is
task body Worker is
Position : Lists.Cursor := Lists.No_Element;
begin
loop
select
accept Start(The_Position : Lists.Cursor) do
Position := The_Position;
end Start;
-- Do stuff HERE
delay 0.005;
-- Finished so signal the manager
Boss.Signal(Position);
Position := Lists.No_Element;
or
terminate;
end select;
end loop;
end Worker;
Worker_Count : constant := 10;
task body Manager is
-- Worker Pool
Workers : array(1..Worker_Count)
of aliased Worker(Manager'Unchecked_Access); -- ' Fixing formatting
-- Use 2 lists to keep track of who can work and who
-- is already tasked
Bored : Lists.List(Worker_Count);
Busy : Lists.List(Worker_Count);
-- Gonna call a couple of times, so use a nested
-- procedure. This procedure removes a worker
-- from the Busy list and places it on the Bored
-- list.
procedure Handle_Signal(Position : in out Lists.Cursor) is
begin
Put_Line("Worker Completed Work");
Bored.Append(Lists.Element(Position));
Busy.Delete(Position);
end Handle_Signal;
use type Ada.Containers.Count_Type;
begin
-- Start off all workers as Bored
for W of Workers loop
Bored.Append(W'Unchecked_Access); -- ' Fixing formatting
end loop;
-- Start working
loop
select
when Bored.Length > 0 =>
accept Add_Work do
-- Take a worker from the Bored list, put it
-- on the busy list, and send it off to work.
-- It will signal when it is finished
Put_Line("Starting Worker");
Busy.Append(Bored.First_Element);
Bored.Delete_First;
Busy.Last_Element.Start(Busy.Last);
end Add_Work;
or
accept Stop;
Put_Line("Received Stop Signal");
-- Wait for all workers to finish
while Busy.Length > 0 loop
accept Signal(The_Position : in out Lists.Cursor) do
Handle_Signal(The_Position);
end Signal;
end loop;
-- Break out of loop
exit;
or
accept Signal(The_Position: in out Lists.Cursor) do
Handle_Signal(The_Position);
end Signal;
end select;
end loop;
-- Work finished!
Put_Line("Manager is Finished");
end Manager;
end Tasks;
Manager : Tasks.Manager;
begin
for Count in 1 .. 20 loop
Manager.Add_Work;
end loop;
Manager.Stop;
-- Wait for task to finish
loop
exit when Manager'Terminated;
end loop;
Put_Line("Program is Done");
end Hello;
I use cursors to help the worker remember where in the busy list they were, so that they can tell the Manager, and it can quickly move things around.
Sample Output:
$gnatmake -o hello *.adb
gcc -c hello.adb
gnatbind -x hello.ali
gnatlink hello.ali -o hello
$hello
Starting Worker
Starting Worker
Starting Worker
Starting Worker
Starting Worker
Starting Worker
Starting Worker
Starting Worker
Starting Worker
Starting Worker
Worker Completed Work
Starting Worker
Worker Completed Work
Starting Worker
Worker Completed Work
Starting Worker
Worker Completed Work
Starting Worker
Worker Completed Work
Starting Worker
Worker Completed Work
Starting Worker
Worker Completed Work
Starting Worker
Worker Completed Work
Worker Completed Work
Starting Worker
Worker Completed Work
Starting Worker
Starting Worker
Received Stop Signal
Worker Completed Work
Worker Completed Work
Worker Completed Work
Worker Completed Work
Worker Completed Work
Worker Completed Work
Worker Completed Work
Worker Completed Work
Worker Completed Work
Worker Completed Work
Manager is Finished
Program is Done
Note that you can pretty this up and hide a bunch of things, I just wanted to get a quick example out.
I sort of understand threads, correct me if I'm wrong.
Is a single thread allocated to a piece of code until that code has completed?
Are the threads prioritised to whichever piece of code is run first?
What is the difference between main queue and thread?
My most important question:
Can threads run at the same time? If so how can I specify which parts of my code should run at a selected thread?
Let me start this way. Unless you are writing a special kind of application (and you will know if you are), forget about threads. Working with threads is complex and tricky. Use dispatch queues… it's simpler and easier.
Dispatch queues run tasks. Tasks are closures (blocks) or functions. When you need to run a task off the main dispatch queue, you call one of the dispatch_ functions, the primary one being dispatch_async(). When you call dispatch_async(), you need to specify which queue to run the task on. To get a queue, you call one of the dispatch_queue_create() or dispatch_get_, the primary one being dispatch_get_global_queue.
NOTE: Swift 3 changed this from a function model to an object model. The dispatch_ functions are instance methods of DispatchQueue. The dispatch_get_ functions are turned into class methods/properties of DispatchQueue
// Swift 3
DispatchQueue.global(qos: .background).async {
var calculation = arc4random()
}
// Swift 2
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0)) {
var calculation = arc4random()
}
The trouble here is any and all tasks which update the UI must be run on the main thread. This is usually done by calling dispatch_async() on the main queue (dispatch_get_main_queue()).
// Swift 3
DispatchQueue.global(qos: .background).async {
var calculation = arc4random()
DispatchQueue.main.async {
print("\(calculation)")
}
}
// Swift 2
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0)) {
var calculation = arc4random()
dispatch_async(dispatch_get_main_queue()) {
print("\(calculation)")
}
}
The gory details are messy. To keep it simple, dispatch queues manage thread pools. It is up to the dispatch queue to create, run, and eventually dispose of threads. The main queue is a special queue which has only 1 thread. The operating system is tasked with assigning threads to a processor and executing the task running on the thread.
With all that out of the way, now I will answer your questions.
Is a single thread allocated to a piece of code until that code has completed?
A task will run in a single thread.
Are the threads prioritised to whichever piece of code is run first?
Tasks are assigned to a thread. A task will not change which thread it runs on. If a task needs to run in another thread, then it creates a new task and assigns that new task to the other thread.
What is the difference between main queue and thread?
The main queue is a dispatch queue which has 1 thread. This single thread is also known as the main thread.
Can threads run at the same time?
Threads are assigned to execute on processors by the operating system. If your device has multiple processors (they all do now-a-days), then multiple threads are executing at the same time.
If so how can I specify which parts of my code should run at a selected thread?
Break you code into tasks. Dispatch the tasks on a dispatch queue.
Let's say we have a an action below in our controller. At each request performLogin will be called by many users.
def performLogin( ) = {
Async {
// API call to the datasource1
val id = databaseService1.getIdForUser();
// API call to another data source different from above
// This process depends on id returned by the call above
val user = databaseService2.getUserGivenId(id);
// Very CPU intensive task
val token = performProcess(user)
// Very CPU intensive calculations
val hash = encrypt(user)
Future.successful(hash)
}
}
I kind of know what the fork-join-executor does. Basically from the main thread which receives a request, it spans multiple worker threads which in tern will divide the work into few chunks. Eventually main thread will join those result and return from the function.
On the other hand, if I were to choose the thread-pool-executor, my understanding is that a thread is chosen from the thread pool, this selected thread will do the work, then go back to the thread pool to listen to more work to do. So no sub dividing of the task happening here.
In above code parallelism by fork-join executor is not possible in my opinion. Each call to the different methods/functions requires something from the previous step. If I were to choose the fork-join executor for the threading how would that benefit me? How would above code execution differ among fork-join vs thread-pool executor.
Thanks
This isn't parallel code, everything inside of your Async call will run in one thread. In fact, Play! never spawns new threads in response to requests - it's event-based, there is an underlying thread pool that handles whatever work needs to be done.
The executor handles scheduling the work from Akka actors and from most Futures (not those created with Future.successful or Future.failed). In this case, each request will be a separate task that the executor has to schedule onto a thread.
The fork-join-executor replaced the thread-pool-executor because it allows work stealing, which improves efficiency. There is no difference in what can be parallelized with the two executors.
I am trying to model a system where there are multiple threads producing data, and a single thread consuming the data. The trick is that I don't want a dedicated thread to consume the data because all of the threads live in a pool. Instead, I want one of the producers to empty the queue when there is work, and yield if another producer is already clearing the queue.
The basic idea is that there is a queue of work, and a lock around the processing. Each producer pushes its payload onto the queue, and then attempts to enter the lock. The attempt is non-blocking and returns either true (the lock was acquired), or false (the lock is held by someone else).
If the lock is acquired, then that thread then processes all of the data in the queue until it is empty (including any new payloads introduced by other producers during processing). Once all of the work has been processed, the thread releases the lock and quits out.
The following is C++ code for the algorithm:
void Process(ITask *task) {
// queue is a thread safe implementation of a regular queue
queue.push(task);
// crit_sec is some handle to a critical section like object
// try_scoped_lock uses RAII to attempt to acquire the lock in the constructor
// if the lock was acquired, it will release the lock in the
// destructor
try_scoped_lock lock(crit_sec);
// See if this thread won the lottery. Prize is doing all of the dishes
if (!lock.Acquired())
return;
// This thread got the lock, so it needs to do the work
ITask *currTask;
while (queue.try_pop(currTask)) {
... execute task ...
}
}
In general this code works fine, and I have never actually witnessed the behavior I am about to describe below, but that implementation makes me feel uneasy. It stands to reason that a race condition is introduced between when the thread exits the while loop and when it releases the critical section.
The whole algorithm relies on the assumption that if the lock is being held, then a thread is servicing the queue.
I am essentially looking for enlightenment on 2 questions:
Am I correct that there is a race condition as described (bonus for other races)
Is there a standard pattern for implementing this mechanism that is performant and doesn't introduce race conditions?
Yes, there is a race condition.
Thread A adds a task, gets the lock, processes itself, then asks for a task from the queue. It is rejected.
Thread B at this point adds a task to the queue. It then attempts to get the lock, and fails, because thread A has the lock. Thread B exits.
Thread A then exits, with the queue non-empty, and nobody processing the task on it.
This will be difficult to find, because that window is relatively narrow. To make it more likely to find, after the while loop introduce a "sleep for 10 seconds". In the calling code, insert a task, wait 5 seconds, then insert a second task. After 10 more seconds, check that both insert tasks are finished, and there is still a task to be processed on the queue.
One way to fix this would be to change try_pop to try_pop_or_unlock, and pass in your lock to it. try_pop_or_unlock then atomically checks for an empty queue, and if so unlocks the lock and returns false.
Another approach is to improve the thread pool. Add a counting semaphore based "consume" task launcher to it.
semaphore_bool bTaskActive;
counting_semaphore counter;
when (counter || !bTaskActive)
if (bTaskActive)
return
bTaskActive = true
--counter
launch_task( process_one_off_queue, when_done( [&]{ bTaskActive=false ) );
When the counting semaphore is active, or when poked by the finished consume task, it launches a consume task if there is no consume task active.
But that is just off the top of my head.