ProcessPoolExecutor gets stuck, ThreadPool Executor does not

ProcessPoolExecutor gets stuck, ThreadPool Executor does not - python-3.x

I have an application that lets me select whether to use threads or processes:
def _get_future(self, workers):
if self.config == "threadpool":
self.logger.debug("using thread pools")
executor = ThreadPoolExecutor(max_workers=workers)
else:
self.logger.debug("using process pools")
executor = ProcessPoolExecutor(max_workers=workers)
return executor
Later I execute the code:
self.executor = self._get_future()
for component in components:
self.logger.debug("submitting {} to future ".format(component))
self.future_components.append(self.executor.submit
(self._send_component, component))
# Wait for all tasks to finish
while self.future_components:
self.future_components.pop().result()
When I use processes, my Applications gets stuck. The _send_component method is never called. When I use threads all works fine.

The problem is the imperative approach, this is a use case for a functional approach.
self._send_component is a member function of a class. Separate processes mean no joint memory to share variables.
The solution was to rewrite the code so that _send_component is a static method.

Related

Kotlin coroutines multithread dispatcher and thread-safety for local variables

Let's consider this simple code with coroutines
import kotlinx.coroutines.*
import java.util.concurrent.Executors
fun main() {
runBlocking {
launch (Executors.newFixedThreadPool(10).asCoroutineDispatcher()) {
var x = 0
val threads = mutableSetOf<Thread>()
for (i in 0 until 100000) {
x++
threads.add(Thread.currentThread())
yield()
}
println("Result: $x")
println("Threads: $threads")
}
}
}
As far as I understand this is quite legit coroutines code and it actually produces expected results:
Result: 100000
Threads: [Thread[pool-1-thread-1,5,main], Thread[pool-1-thread-2,5,main], Thread[pool-1-thread-3,5,main], Thread[pool-1-thread-4,5,main], Thread[pool-1-thread-5,5,main], Thread[pool-1-thread-6,5,main], Thread[pool-1-thread-7,5,main], Thread[pool-1-thread-8,5,main], Thread[pool-1-thread-9,5,main], Thread[pool-1-thread-10,5,main]]
The question is what makes these modifications of local variables thread-safe (or is it thread-safe?). I understand that this loop is actually executed sequentially but it can change the running thread on every iteration. The changes done from thread in first iteration still should be visible to the thread that picked up this loop on second iteration. Which code does guarantee this visibility? I tried to decompile this code to Java and dig around coroutines implementation with debugger but did not find a clue.

Your question is completely analogous to the realization that the OS can suspend a thread at any point in its execution and reschedule it to another CPU core. That works not because the code in question is "multicore-safe", but because it is a guarantee of the environment that a single thread behaves according to its program-order semantics.
Kotlin's coroutine execution environment likewise guarantees the safety of your sequential code. You are supposed to program to this guarantee without any worry about how it is maintained.
If you want to descend into the details of "how" out of curiosity, the answer becomes "it depends". Every coroutine dispatcher can choose its own mechanism to achieve it.
As an instructive example, we can focus on the specific dispatcher you use in your posted code: JDK's fixedThreadPoolExecutor. You can submit arbitrary tasks to this executor, and it will execute each one of them on a single (arbitrary) thread, but many tasks submitted together will execute in parallel on different threads.
Furthermore, the executor service provides the guarantee that the code leading up to executor.execute(task) happens-before the code within the task, and the code within the task happens-before another thread's observing its completion (future.get(), future.isCompleted(), getting an event from the associated CompletionService).
Kotlin's coroutine dispatcher drives the coroutine through its lifecycle of suspension and resumption by relying on these primitives from the executor service, and thus you get the "sequential execution" guarantee for the entire coroutine. A single task submitted to the executor ends whenever the coroutine suspends, and the dispatcher submits a new task when the coroutine is ready to resume (when the user code calls continuation.resume(result)).

Is at a good idea to use ThreadPoolExecutor with one worker?

I have a simple rest service which allows you to create task. When a client requests a task - it returns a unique task number and starts executing in a separate thread. The easiest way to implement it
class Executor:
def __init__(self, max_workers=1):
self.executor = ThreadPoolExecutor(max_workers)
def execute(self, body, task_number):
# some logic
pass
def some_rest_method(request):
body = json.loads(request.body)
task_id = generate_task_id()
Executor(max_workers=1).execute(body)
return Response({'taskId': task_id})
Is it a good idea to create each time ThreadPoolExecutor with one (!) workers if i know than one request - is one new task (new thread). Perhaps it is worth putting them in the queue somehow? Maybe the best option is to create a regular stream every time?

Is it a good idea to create each time ThreadPoolExecutor...
No. That completely defeats the purpose of a thread pool. The reason for using a thread pool is so that you don't create and destroy a new thread for every request. Creating and destroying threads is expensive. The idea of a thread pool is that it keeps the "worker thread(s)" alive and re-uses it/them for each next request.
...with just one thread
There's a good use-case for a single-threaded executor, though it probably does not apply to your problem. The use-case is, you need a sequence of tasks to be performed "in the background," but you also need them to be performed sequentially. A single-thread executor will perform the tasks, one after another, in the same order that they were submitted.
Perhaps it is worth putting them in the queue somehow?
You already are putting them in a queue. Every thread pool has a queue of pending tasks. When you submit a task (i.e., executor.execute(...)) that puts the task into the queue.
what's the best way...in my case?
The bones of a simplistic server look something like this (pseudo-code):
POOL = ThreadPoolExecutor(...with however many threads seem appropriate...)
def service():
socket = create_a_socket_that_listens_on_whatever_port()
while True:
client_connection = socket.accept()
POOL.submit(request_handler, connection=connection)
def request_handler(connection):
request = receive_request_from(connection)
reply = generate_reply_based_on(request)
send_reply_to(reply, connection)
connection.close()
def main():
initialize_stuff()
service()
Of course, there are many details that I have left out. I can't design it for you. Especially not in Python. I've written servers like this in other languages, but I'm pretty new to Python.

multiple concurrent threads with futures.threadpool

I am trying to run multiple threads alongside my main thread. Running one thread individually works fine with both threading.Thread and concurrent.futures.ThreadPoolExecutor.
Running two separate threads does not work at all though. One of the threads just runs the entire time, locking up both other threads. There are no "shared" resources that get locked afaik,they have nothing to do with each other (except calling the next thread), so i don't understand why this won't work
My code looks like this:
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(function())
result = future.result()
And the function running inside the thread also calls :
function():
with concurrent.futures.ThreadPoolExecutor() as executor:
inner_result = (executor.submit(inner_function,"value")).result()
I've also tried running this function with:
t = Thread(target=function..., getting the same result.
Is there something i am missing to running multiple concurrent threads in python?

The issue was passing a result instead of the function itself to the executor.
this: executor.submit(function())
should be: executor.submit(function)

Terminating a QThread that wraps a function (i.e. can't poll a "wants_to_end" flag)

I'm trying to create a thread for a GUI that wraps a long-running function. My problem is thus phrased in terms of PyQt and QThreads, but I imagine the same concept could apply to standard python threads too, and would appreciate any suggestions generally.
Typically, to allow a thread to be exited while running, I understand that including a "wants_to_end" flag that is periodically checked within the thread is a good practice - e.g.:
Pseudocode (in my thread):
def run(self):
i = 0
while (not self.wants_to_end) and (i < 100):
function_step(i) # where this is some long-running function that includes many streps
i += 1
However, as my GUI is to wrap a pre-written long-running function, I cannot simply insert such a "wants_to_end" flag poll into the long running code.
Is there another way to forcibly terminate my worker thread from my main GUI (i.e. enabling me to include a button in the GUI to stop the processing)?
My simple example case is:
class Worker(QObject):
finished = pyqtSignal()
def __init__(self, parent=None, **kwargs):
super().__init__(parent)
self.kwargs = kwargs
#pyqtSlot()
def run(self):
result = SomeLongComplicatedProcess(**self.kwargs)
self.finished.emit(result)
with usage within my MainWindow GUI:
self.thread = QThread()
self.worker = Worker(arg_a=1, arg_b=2)
self.worker.finished.connect(self.doSomethingInGUI)
self.worker.moveToThread(self.thread)
self.thread.started.connect(self.worker.run)
self.thread.start()

If the long-running function blocks, the only way to forcibly stop the thread is via its terminate() method (it may also be necessary to call wait() as well). However, there is no guarantee that this will always work, and the docs also state the following:
Warning: This function is dangerous and its use is discouraged. The
thread can be terminated at any point in its code path. Threads can be
terminated while modifying data. There is no chance for the thread to
clean up after itself, unlock any held mutexes, etc. In short, use
this function only if absolutely necessary.
A much cleaner solution is to use a separate process, rather than a separate thread. In python, this could mean using the multiprocessing module. But if you aren't familiar with that, it might be simpler to run the function as a script via QProcess (which provides signals that should allow easier integration with your GUI). You can then simply kill() the worker process whenever necessary. However, if that solution is somehow unsatisfactory, there are many other IPC approaches that might better suit your requirements.

Play Framework: thread-pool-executor vs fork-join-executor

Let's say we have a an action below in our controller. At each request performLogin will be called by many users.
def performLogin( ) = {
Async {
// API call to the datasource1
val id = databaseService1.getIdForUser();
// API call to another data source different from above
// This process depends on id returned by the call above
val user = databaseService2.getUserGivenId(id);
// Very CPU intensive task
val token = performProcess(user)
// Very CPU intensive calculations
val hash = encrypt(user)
Future.successful(hash)
}
}
I kind of know what the fork-join-executor does. Basically from the main thread which receives a request, it spans multiple worker threads which in tern will divide the work into few chunks. Eventually main thread will join those result and return from the function.
On the other hand, if I were to choose the thread-pool-executor, my understanding is that a thread is chosen from the thread pool, this selected thread will do the work, then go back to the thread pool to listen to more work to do. So no sub dividing of the task happening here.
In above code parallelism by fork-join executor is not possible in my opinion. Each call to the different methods/functions requires something from the previous step. If I were to choose the fork-join executor for the threading how would that benefit me? How would above code execution differ among fork-join vs thread-pool executor.
Thanks

This isn't parallel code, everything inside of your Async call will run in one thread. In fact, Play! never spawns new threads in response to requests - it's event-based, there is an underlying thread pool that handles whatever work needs to be done.
The executor handles scheduling the work from Akka actors and from most Futures (not those created with Future.successful or Future.failed). In this case, each request will be a separate task that the executor has to schedule onto a thread.
The fork-join-executor replaced the thread-pool-executor because it allows work stealing, which improves efficiency. There is no difference in what can be parallelized with the two executors.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string