multiple concurrent threads with futures.threadpool

multiple concurrent threads with futures.threadpool - python-3.x

I am trying to run multiple threads alongside my main thread. Running one thread individually works fine with both threading.Thread and concurrent.futures.ThreadPoolExecutor.
Running two separate threads does not work at all though. One of the threads just runs the entire time, locking up both other threads. There are no "shared" resources that get locked afaik,they have nothing to do with each other (except calling the next thread), so i don't understand why this won't work
My code looks like this:
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(function())
result = future.result()
And the function running inside the thread also calls :
function():
with concurrent.futures.ThreadPoolExecutor() as executor:
inner_result = (executor.submit(inner_function,"value")).result()
I've also tried running this function with:
t = Thread(target=function..., getting the same result.
Is there something i am missing to running multiple concurrent threads in python?

The issue was passing a result instead of the function itself to the executor.
this: executor.submit(function())
should be: executor.submit(function)

Related

Are system calls ran on the same thread?

When using the multi-threaded approach to solve IO Bound problems in Python, this works by freeing the GIL. Let us suppose we have Thread1 which takes 10 seconds to read a file, during this 10 seconds it does not require the GIL and can leave Thread2 to execute code. Thread1 and Thread2 are effectively running in parallel because Thread1 is doing system call operations and can execute independently of Thread2, however Thread1 is still executing code.
Now, suppose we have a setup using asyncio or any asynchronous programming code. When we do something such as,
file_content = await ten_second_long_file_read()
During the time in which await is called, system calls are done to read the content of the files and when it is done an event is sent back and code execution can be later continue. During the time we are await'ing, other code can be ran.
My confusion comes from the fact that asynchronous programming is primarily single threaded. With the multiple threaded approach when T1 is reading from a file, it is still performing code execution, it simply free'd the GIL to perform work in parallel with another thread. However with asynchronous programming, when we are awaiting, how is it performing other tasks when we are waiting, aswell as reading data in a single thread? I understand the multiple-threaded idea, but not asynchronous because it is still performing the system calls in a single thread. With asynchronous programming it has nowhere to free the GIL to, considering there is only one thread. Is asyncio secretly using threads?

The number of filehandles is independent of the GIL, and threads. Posix select documentation gives a bit of an idea of the distinct mechanism around file handles.
To illustrate I created three files, 1.txt etc. These are just:
1
one
Obviously open for reading is ok but not for writing. To make a ten second read I just held the filehandle open for ten seconds, reading the first line, waiting 10 seconds, then reading the second line.
asyncio version
import asyncio
from threading import active_count
do = ['1.txt', '2.txt', '3.txt']
async def ten_second_long_file_read():
while do:
doing = do.pop()
with open(doing, 'r') as f:
print(f.readline().strip())
await asyncio.sleep(10)
print(f"threads {active_count()}")
print(f.readline().strip())
async def main():
await asyncio.gather(asyncio.create_task(ten_second_long_file_read()),
asyncio.create_task(ten_second_long_file_read()))
asyncio.run(main())
This produces a very predictable output and as expected, one thread only.
3
2
threads 1
three
1
threads 1
two
threads 1
one
threading - changes
Remove async of course. Swap asyncio.sleep(10) for time.sleep(10). The main change is the calling function.
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as e:
e.submit(ten_second_long_file_read)
e.submit(ten_second_long_file_read)
Also a fairly predictable output, however you cannot rely on this.
3
2
threads 3
three
threads 3
two
1
threads 2
one
Running the same threaded version in debug the output is a bit random, on one run on my computer this was:
23
threads 3threads 3
twothree
1
threads 2
one
This highlights a difference in threads in that the running thread is pre-emptively switched creating a whole bundle of complexity under the heading thread safety. This issue does not exist in asyncio as there is a single thread.
multi-processing
Similar to the threaded code however __name__ == '__main__' is required and the process pool executor provides a snapshot of the context.
def main():
with concurrent.futures.ProcessPoolExecutor(max_workers=2) as e:
e.submit(ten_second_long_file_read)
e.submit(ten_second_long_file_read)
if __name__ == '__main__': # required for executor
main()
Two big differences. No shared understanding of the do list so everything is done twice. Processes don't know what the other process has done. More CPU power available, however more work required to manage the load.
Three processes required for this so the overhead is large, however each process only has one thread.
3
3
threads 1
threads 1
three
three
2
2
threads 1
threads 1
two
two
1
1
threads 1
threads 1
one
one

Does a loop.run_in_executor functions need asyncio.lock() or threading.Lock()?

I copied the following code for my project and it's worked quite well for me but I don't really understand how the following code runs my blocking_function:
#client.event
async def on_message(message):
loop = asyncio.get_event_loop()
block_response = await loop.run_in_executor(ThreadPoolExecutor(), blocking_function)
where on_message is called every time I receive a message. If I receive multiple messages, they are processed asynchronously.
blocking_function is a synchronous function that I don't want to be run when another blocking_function is running.Then within blocking_function, should I use threading.Lock() or asyncio.lock()?

As pointed out by dirn in the comment, in blocking_function you cannot use an asyncio.Lock because it's just not async. (The opposite also applies: you cannot lock a threading.Lock from an async function because attempting to do so would block the event loop.) If you need to guard data accessed by other instances of blocking_function, you should use a threading.Lock.
but I don't really understand how the following code runs my blocking_function
It hands off blocking_function to the thread pool you created to run it. The thread pool queues and runs the function (which happens "in the background" from your perspective), and the run_in_executor arranges the event loop to be notified when the function is done, handing off its return value as the result of the await expression.
Note that you should use None as the first argument of run_in_executor. If you use ThreadPoolExecutor(), you create a whole new thread pool for each message, and you never dispose of it. A thread pool is normally meant to be created once, and reuse a fixed number ("pool") of threads for subsequent work. None tells asyncio to use the thread pool it creates for this purpose.

It seems you can easily achieve your desired objective by ensuring a single thread is used.
A simple solution would be to ensure that all calls to blocking_function is run on a single thread. This can be easily achieved by creating a ThreadPoolExecutor object with 1 worker outside of the async function. Then every subsequent calls to the blocking function will be run on that single thread
thread_pool = ThreadPoolExecutor(max_workers=1)
#client.event
async def on_message(message):
loop = asyncio.get_event_loop()
block_response = await loop.run_in_executor(thread_pool, blocking_function)
Don't forget to shutdown the thread afterwards.

Wait till all tasks are run in Celery python

I am using celery in python for asynchronous tasks. I want to capture the result of it after all the tasks assigned to all workers are done.
For that, I am using the .get() method, but the problem with get() is that all the tasks are being assigned to a single worker which is synchronous but I want the tasks to be distributed to all the workers available.
Below is my snippet.
for url in urls:
res = good_bad_urls.delay(url[1])
res.get()
return JsonResponse(some_data)
Is there any other method in celery to wait until all the tasks run asynchronously?

but the problem with get() is that all the tasks are being assigned to a single worker which is synchronous
Well, not exactly. The task distribution works just the same (even if it can seem to do otherwise), and the tasks themselves are still async. The difference is that result.get() IS a blocking call - so in your case, it waits for the current task to finish until it launches the next one.
But anyway: the solution here is to use a Group. In your case, it should look something like
jobs = group([good_bad_urls.s(url[1]) for url in urls])
async_res = jobs.apply_async()
result = async_res.get()
The get() call will now wait for all tasks to be finished, but they will be launched in parallel.

ProcessPoolExecutor gets stuck, ThreadPool Executor does not

I have an application that lets me select whether to use threads or processes:
def _get_future(self, workers):
if self.config == "threadpool":
self.logger.debug("using thread pools")
executor = ThreadPoolExecutor(max_workers=workers)
else:
self.logger.debug("using process pools")
executor = ProcessPoolExecutor(max_workers=workers)
return executor
Later I execute the code:
self.executor = self._get_future()
for component in components:
self.logger.debug("submitting {} to future ".format(component))
self.future_components.append(self.executor.submit
(self._send_component, component))
# Wait for all tasks to finish
while self.future_components:
self.future_components.pop().result()
When I use processes, my Applications gets stuck. The _send_component method is never called. When I use threads all works fine.

The problem is the imperative approach, this is a use case for a functional approach.
self._send_component is a member function of a class. Separate processes mean no joint memory to share variables.
The solution was to rewrite the code so that _send_component is a static method.

ListenableFuture running on different thread

I am using Futures.transform and I want my ListenableFuture to run on separate thread. Is it possible to that? I see ListenableFuture has sameThreadExecutor option, is there a option to run in different thread?
Details:
I have single thread that read data from the network using some async mechanism, depending on the requests it gets it has to dispatch the request to another thread so that this thread goes back to listen more requests. I am trying to use Futures.transform to do it.

You need an Executor that is not the sameThreadExecutor. The most common I've seen is the cached thread pool option, but if you're doing something simple with only one other thread, you might try this:
final ListeningExecutorService executor =
MoreExecutors.listeningDecorator(Executors.newSingleThreadExecutor());
If you're looking to run a single task in another thread, calling executor.submit() will then run your Callable on another thread. If instead you're trying to do the transformation of the future in another thread, you can pass this executor into Futures.transform.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string