Scala actors, futures and system calls resulting in Thread leaks - multithreading

I am running a complex software with different actors (scala actors). Some of them have some executions that uses scala futures to avoid locking and keep processing new received messages (simplified code):
def act {
while (true) {
receive {
case (code: String) =>
val codeMatch = future { match_code(code) }
for (c <- codeMatch)
yield callback(code)(JSON.parseJSON(c))
}
}
}
def match_code(code: String) {
val result = s"my_script.sh $code" !!
}
I noticed looking at jvisualvm and Eclipse Debugger that the number of active threads keeps increasing when this system is running. I am afraid I am having some kind of Thread leak, but I can't detect where is the problem.
Here are some screenshots of both finished and live threads (I hided some live threads that are not related to this problem)
Finished Threads
Living threads
Edit 1:
In the above graphs example, I run the system with only 3 actors of different classes: Actor1 sends messages to Actor2 that sends message to Actor3

You are using receive so each actor will use its own thread, and you don't at least in this example provide any way for actors to terminate. So you would expect to have one new thread per actor that was ever started. If that is what you see, then all is working as expected. If you want to have actors cease running, you will have to let them eventually fall out of the while loop or call sys.exit on them or somesuch.
(Also, old-style Scala actors are deprecated in favor of Akka actors in 2.11.)
You also don't (in the code above) have any indication whether the future actually completed. If the futures don't finish, they'll keep tying up threads.

Related

Kotlin coroutines multithread dispatcher and thread-safety for local variables

Let's consider this simple code with coroutines
import kotlinx.coroutines.*
import java.util.concurrent.Executors
fun main() {
runBlocking {
launch (Executors.newFixedThreadPool(10).asCoroutineDispatcher()) {
var x = 0
val threads = mutableSetOf<Thread>()
for (i in 0 until 100000) {
x++
threads.add(Thread.currentThread())
yield()
}
println("Result: $x")
println("Threads: $threads")
}
}
}
As far as I understand this is quite legit coroutines code and it actually produces expected results:
Result: 100000
Threads: [Thread[pool-1-thread-1,5,main], Thread[pool-1-thread-2,5,main], Thread[pool-1-thread-3,5,main], Thread[pool-1-thread-4,5,main], Thread[pool-1-thread-5,5,main], Thread[pool-1-thread-6,5,main], Thread[pool-1-thread-7,5,main], Thread[pool-1-thread-8,5,main], Thread[pool-1-thread-9,5,main], Thread[pool-1-thread-10,5,main]]
The question is what makes these modifications of local variables thread-safe (or is it thread-safe?). I understand that this loop is actually executed sequentially but it can change the running thread on every iteration. The changes done from thread in first iteration still should be visible to the thread that picked up this loop on second iteration. Which code does guarantee this visibility? I tried to decompile this code to Java and dig around coroutines implementation with debugger but did not find a clue.
Your question is completely analogous to the realization that the OS can suspend a thread at any point in its execution and reschedule it to another CPU core. That works not because the code in question is "multicore-safe", but because it is a guarantee of the environment that a single thread behaves according to its program-order semantics.
Kotlin's coroutine execution environment likewise guarantees the safety of your sequential code. You are supposed to program to this guarantee without any worry about how it is maintained.
If you want to descend into the details of "how" out of curiosity, the answer becomes "it depends". Every coroutine dispatcher can choose its own mechanism to achieve it.
As an instructive example, we can focus on the specific dispatcher you use in your posted code: JDK's fixedThreadPoolExecutor. You can submit arbitrary tasks to this executor, and it will execute each one of them on a single (arbitrary) thread, but many tasks submitted together will execute in parallel on different threads.
Furthermore, the executor service provides the guarantee that the code leading up to executor.execute(task) happens-before the code within the task, and the code within the task happens-before another thread's observing its completion (future.get(), future.isCompleted(), getting an event from the associated CompletionService).
Kotlin's coroutine dispatcher drives the coroutine through its lifecycle of suspension and resumption by relying on these primitives from the executor service, and thus you get the "sequential execution" guarantee for the entire coroutine. A single task submitted to the executor ends whenever the coroutine suspends, and the dispatcher submits a new task when the coroutine is ready to resume (when the user code calls continuation.resume(result)).

Serial Dispatch Queue with Asynchronous Blocks

Is there ever any reason to add blocks to a serial dispatch queue asynchronously as opposed to synchronously?
As I understand it a serial dispatch queue only starts executing the next task in the queue once the preceding task has completed executing. If this is the case, I can't see what you would you gain by submitting some blocks asynchronously - the act of submission may not block the thread (since it returns straight-away), but the task won't be executed until the last task finishes, so it seems to me that you don't really gain anything.
This question has been prompted by the following code - taken from a book chapter on design patterns. To prevent the underlying data array from being modified simultaneously by two separate threads, all modification tasks are added to a serial dispatch queue. But note that returnToPool adds tasks to this queue asynchronously, whereas getFromPool adds its tasks synchronously.
class Pool<T> {
private var data = [T]();
// Create a serial dispath queue
private let arrayQ = dispatch_queue_create("arrayQ", DISPATCH_QUEUE_SERIAL);
private let semaphore:dispatch_semaphore_t;
init(items:[T]) {
data.reserveCapacity(data.count);
for item in items {
data.append(item);
}
semaphore = dispatch_semaphore_create(items.count);
}
func getFromPool() -> T? {
var result:T?;
if (dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER) == 0) {
dispatch_sync(arrayQ, {() in
result = self.data.removeAtIndex(0);
})
}
return result;
}
func returnToPool(item:T) {
dispatch_async(arrayQ, {() in
self.data.append(item);
dispatch_semaphore_signal(self.semaphore);
});
}
}
Because there's no need to make the caller of returnToPool() block. It could perhaps continue on doing other useful work.
The thread which called returnToPool() is presumably not just working with this pool. It presumably has other stuff it could be doing. That stuff could be done simultaneously with the work in the asynchronously-submitted task.
Typical modern computers have multiple CPU cores, so a design like this improves the chances that CPU cores are utilized efficiently and useful work is completed sooner. The question isn't whether tasks submitted to the serial queue operate simultaneously — they can't because of the nature of serial queues — it's whether other work can be done simultaneously.
Yes, there are reasons why you'd add tasks to serial queue asynchronously. It's actually extremely common.
The most common example would be when you're doing something in the background and want to update the UI. You'll often dispatch that UI update asynchronously back to the main queue (which is a serial queue). That way the background thread doesn't have to wait for the main thread to perform its UI update, but rather it can carry on processing in the background.
Another common example is as you've demonstrated, when using a GCD queue to synchronize interaction with some object. If you're dealing with immutable objects, you can dispatch these updates asynchronously to this synchronization queue (i.e. why have the current thread wait, but rather instead let it carry on). You'll do reads synchronously (because you're obviously going to wait until you get the synchronized value back), but writes can be done asynchronously.
(You actually see this latter example frequently implemented with the "reader-writer" pattern and a custom concurrent queue, where reads are performed synchronously on concurrent queue with dispatch_sync, but writes are performed asynchronously with barrier with dispatch_barrier_async. But the idea is equally applicable to serial queues, too.)
The choice of synchronous v asynchronous dispatch has nothing to do with whether the destination queue is serial or concurrent. It's simply a question of whether you have to block the current queue until that other one finishes its task or not.
Regarding your code sample code, that is correct. The getFromPool should dispatch synchronously (because you have to wait for the synchronization queue to actually return the value), but returnToPool can safely dispatch asynchronously. Obviously, I'm wary of seeing code waiting for semaphores if that might be called from the main thread (so make sure you don't call getFromPool from the main thread!), but with that one caveat, this code should achieve the desired purpose, offering reasonably efficient synchronization of this pool object, but with a getFromPool that will block if the pool is empty until something is added to the pool.

Play Framework: thread-pool-executor vs fork-join-executor

Let's say we have a an action below in our controller. At each request performLogin will be called by many users.
def performLogin( ) = {
Async {
// API call to the datasource1
val id = databaseService1.getIdForUser();
// API call to another data source different from above
// This process depends on id returned by the call above
val user = databaseService2.getUserGivenId(id);
// Very CPU intensive task
val token = performProcess(user)
// Very CPU intensive calculations
val hash = encrypt(user)
Future.successful(hash)
}
}
I kind of know what the fork-join-executor does. Basically from the main thread which receives a request, it spans multiple worker threads which in tern will divide the work into few chunks. Eventually main thread will join those result and return from the function.
On the other hand, if I were to choose the thread-pool-executor, my understanding is that a thread is chosen from the thread pool, this selected thread will do the work, then go back to the thread pool to listen to more work to do. So no sub dividing of the task happening here.
In above code parallelism by fork-join executor is not possible in my opinion. Each call to the different methods/functions requires something from the previous step. If I were to choose the fork-join executor for the threading how would that benefit me? How would above code execution differ among fork-join vs thread-pool executor.
Thanks
This isn't parallel code, everything inside of your Async call will run in one thread. In fact, Play! never spawns new threads in response to requests - it's event-based, there is an underlying thread pool that handles whatever work needs to be done.
The executor handles scheduling the work from Akka actors and from most Futures (not those created with Future.successful or Future.failed). In this case, each request will be a separate task that the executor has to schedule onto a thread.
The fork-join-executor replaced the thread-pool-executor because it allows work stealing, which improves efficiency. There is no difference in what can be parallelized with the two executors.

QThread execution freezes my GUI

I'm new to multithread programming. I wrote this simple multi thread program with Qt. But when I run this program it freezes my GUI and when I click inside my widow, it responds that your program is not responding .
Here is my widget class. My thread starts to count an integer number and emits it when this number is dividable by 1000. In my widget simply I catch this number with signal-slot mechanism and show it in a label and a progress bar.
Widget::Widget(QWidget *parent) :
QWidget(parent),
ui(new Ui::Widget)
{
ui->setupUi(this);
MyThread *th = new MyThread;
connect( th, SIGNAL(num(int)), this, SLOT(setNum(int)));
th->start();
}
void Widget::setNum(int n)
{
ui->label->setNum( n);
ui->progressBar->setValue(n%101);
}
and here is my thread run() function :
void MyThread::run()
{
for( int i = 0; i < 10000000; i++){
if( i % 1000 == 0)
emit num(i);
}
}
thanks!
The problem is with your thread code producing an event storm. The loop counts very fast -- so fast, that the fact that you emit a signal every 1000 iterations is pretty much immaterial. On modern CPUs, doing a 1000 integer divisions takes on the order of 10 microseconds IIRC. If the loop was the only limiting factor, you'd be emitting signals at a peak rate of about 100,000 per second. This is not the case because the performance is limited by other factors, which we shall discuss below.
Let's understand what happens when you emit signals in a different thread from where the receiver QObject lives. The signals are packaged in a QMetaCallEvent and posted to the event queue of the receiving thread. An event loop running in the receiving thread -- here, the GUI thread -- acts on those events using an instance of QAbstractEventDispatcher. Each QMetaCallEvent results in a call to the connected slot.
The access to the event queue of the receiving GUI thread is serialized by a QMutex. On Qt 4.8 and newer, the QMutex implementation got a nice speedup, so the fact that each signal emission results in locking of the queue mutex is not likely to be a problem. Alas, the events need to be allocated on the heap in the worker thread, and then deallocated in the GUI thread. Many heap allocators perform quite poorly when this happens in quick succession if the threads happen to execute on different cores.
The biggest problem comes in the GUI thread. There seems to be a bunch of hidden O(n^2) complexity algorithms! The event loop has to process 10,000 events. Those events will be most likely delivered very quickly and end up in a contiguous block in the event queue. The event loop will have to deal with all of them before it can process further events. A lot of expensive operations happen when you invoke your slot. Not only is the QMetaCallEvent deallocated from the heap, but the label schedules an update() (repaint), and this internally posts a compressible event to the event queue. Compressible event posting has to, in worst case, iterate over entire event queue. That's one potential O(n^2) complexity action. Another such action, probably more important in practice, is the progressbar's setValue internally calling QApplication::processEvents(). This can, recursively call your slot to deliver the subsequent signal from the event queue. You're doing way more work than you think you are, and this locks up the GUI thread.
Instrument your slot and see if it's called recursively. A quick-and-dirty way of doing it is
void Widget::setNum(int n)
{
static int level = 0, maxLevel = 0;
level ++;
maxLevel = qMax(level, maxLevel);
ui->label->setNum( n);
ui->progressBar->setValue(n%101);
if (level > 1 && level == maxLevel-1) {
qDebug("setNum recursed up to level %d", maxLevel);
}
level --;
}
What is freezing your GUI thread is not QThread's execution, but the huge amount of work you make the GUI thread do. Even if your code looks innocuous.
Side Note on processEvents and Run-to-Completion Code
I think it was a very bad idea to have QProgressBar::setValue invoke processEvents(). It only encourages the broken way people code things (continuously running code instead of short run-to-completion code). Since the processEvents() call can recurse into the caller, setValue becomes a persona-non-grata, and possibly quite dangerous.
If one wants to code in continuous style yet keep the run-to-completion semantics, there are ways of dealing with that in C++. One is just by leveraging the preprocessor, for example code see my other answer.
Another way is to use expression templates to get the C++ compiler to generate the code you want. You may want to leverage a template library here -- Boost spirit has a decent starting point of an implementation that can be reused even though you're not writing a parser.
The Windows Workflow Foundation also tackles the problem of how to write sequential style code yet have it run as short run-to-completion fragments. They resort to specifying the flow of control in XML. There's apparently no direct way of reusing standard C# syntax. They only provide it as a data structure, a-la JSON. It'd be simple enough to implement both XML and code-based WF in Qt, if one wanted to. All that in spite of .NET and C# providing ample support for programmatic generation of code...
The way you implemented your thread, it does not have its own event loop (because it does not call exec()). I'm not sure if your code within run() is actually executed within your thread or within the GUI thread.
Usually you should not subclass QThread. You probably did so because you read the Qt Documentation which unfortunately still recommends subclassing QThread - even though the developers long ago wrote a blog entry stating that you should not subclass QThread. Unfortunately, they still haven't updated the documentation appropriately.
I recommend reading "You're doing it wrong" on Qt Blog and then use the answer by "Kari" as an example of how to set up a basic multi-threaded system.
But when I run this program it freezes my GUI and when I click inside my window,
it responds that your program is not responding.
Yes because IMO you're doing too much work in thread that it exhausts CPU. Generally program is not responding message pops up when process show no progress in handling application event queue requests. In your case this happens.
So in this case you should find a way to divide the work. Just for the sake of example say, thread runs in chunks of 100 and repeat the thread till it completes 10000000.
Also you should have look at QCoreApplication::processEvents() when you're performing a lengthy operation.

How to set up a Scala actor to exclusively use a separate thread to run?

As far as I understand, Scala manages a thread pool to run actors, sharing threads among them. Can I set up a particular actor to run in a separate thread exclusively, never sharing it with another actor?
It sounds like you are using Scala (not Akka) actors. In that case if you use the receive or receiveWithin style of message handling then each actor will get its own thread. Using the react style of message handling shares a thread pool among actors.
When I say the receive "style", I mean in a loop, for example:
val timerActor = actor {
while (true) {
receiveWithin(60 * 1000) {
case Stop => self.exit()
case TIMEOUT =>
destination ! Tick
}
}
}
In this case timerActor does not share its thread with any other actor. receiveWithin will block until either the actor receives a Stop message or 60 seconds passes. If 60 seconds passes then the TIMEOUT case is executed.
If you want to learn the gritty details about Scala actors, check out the paper Actors That Unify Threads and Events.
Akka also supports thread-based actors in addition to event-based actors.

Resources