Resume async task before all tasks are started - python-3.x

In the example code here all asyncio tasks are started first. After that the tasks are resumed if the IO operation is finished.
The output looks like this where you can see the 6 result messages after the first 6 start messages.
-- Starting
-- Starting
-- Starting
-- Starting
-- Starting
-- Starting
28337 size for
28337 size for
1938204 size for
1938204 size for
38697 size for
38697 size for
FINISHED with 6 results from 6 tasks.
But what I would expect and what whould speed up the thing in my cases is something like this
-- Starting
-- Starting
-- Starting
1938204 size for
-- Starting
28337 size for
28337 size for
-- Starting
38697 size for
-- Starting
28337 size for
28337 size for
1938204 size for
38697 size for
FINISHED with 6 results from 6 tasks.
In my real world code I have hundreds of download tasks like this. It is usual that some of the downloads are finished before all of them are started.
Is there a way to handle this with asyncio?
Here is a minimal working example:
#!/usr/bin/env python3
import random
import urllib.request
import asyncio
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor()
loop = asyncio.get_event_loop()
urls = ['',
async def parse_one_url(u):
print('-- Starting {}...'.format(u))
r = await loop.run_in_executor(executor,
urllib.request.urlopen, u)
r = '{} size for {}'.format(len(, u)
async def do_async_parsing():
tasks = [
for u in urls
completed, pending = await asyncio.wait(tasks)
results = [task.result() for task in completed]
print('FINISHED with {} results from {} tasks.'
.format(len(results), len(tasks)))
if __name__ == '__main__':
# blow up the urls
urls = urls * 2
Side-Question: Isn't asyncio useless in my case? Isn't it easier to use mutliple threads only?

Well, you did create all the downloads upfront and instructed asyncio to launch them all using asyncio.wait. Just starting to execute a coroutine is almost free, so there is no reason for this part to be limited in any way. However, the tasks actually submitted to ThreadPoolExecutor are capped to the number of workers in the pool, the default being 5 times the number of CPUs, but configurable. If the number of URLs exceeds the number of workers, you should get the desired behavior. (But to actually observe it, you need to move the logging prints into the function managed by the executor.)
Note that the synchronous call to must also reside inside the function run by the executor, otherwise it will block the entire event loop. The corrected portion of the code would look like this:
def urlopen(u):
print('-- Starting {}...'.format(u))
r = urllib.request.urlopen(u) # blocking call
content = # another blocking call
print('{} size for {}'.format(len(content), u))
async def parse_one_url(u):
await loop.run_in_executor(executor, urlopen, u)
The above is, however, not idiomatic use of asyncio. Normally the idea is that you don't use threads at all, but call natively async code, for example using aiohttp. Then you get the benefits of asyncio, such as working cancellation and scalability to a large number of tasks. In that setup you would limit the number of concurrent tasks by trivially wrapping the retrieval in an asyncio.Semaphore.
If your whole actual logic consists of synchronous calls, you don't need asyncio at all; you can directly submit futures to the executor and use concurrent.futures synchronization functions like wait() and as_completed to wait for them to finish.


Python multiprocessing taking the brakes off OSX

I have a program that randomly selects 13 cards from a full pack and analyses the hands for shape, point count and some other features important to the game of bridge. The program will select and analyse 10**7 hands in about 5 minutes. Checking the Activity Monitor shows that during execution the CPU (which s a 6 Core processor) is devoting about 9% of its time to the program and ~90% of its time it is idle. So it looks like a prime candidate for multiprocessing and I created a multiprocessing version using a Queue to pass information from each process back to the main program. Having navigated the problems of IDLE not working will multiprocessing (I now run it using PyCharm) and that doing a join on a process before it has finished freezes the program, I got it to work.
However, it doesn’t matter how many processes I use 5,10, 25 or 50 the result is always the same. The CPU devotes about 18% of its time to the program and has ~75% of its time idle and the execution time is slightly more than double at a bit over 10 minutes.
Can anyone explain how I can get the processes to take up more of the CPU time and how I can get the execution time to reflect this? Below are the relevant sections fo the program:
import random
import collections
import datetime
import time
from math import log10
from multiprocessing import Process, Queue
NUM_OF_HANDS = 10**6
def analyse_hands(numofhands, q):
#code remove as not relevant to the problem
q.put((distribution, points, notrumps))
if __name__ == '__main__':
processlist = []
q = Queue()
handsperprocess = NUM_OF_HANDS // NUM_OF_PROCESSES
# Set up the processes and get them to do their stuff
start_time = time.time()
for _ in range(NUM_OF_PROCESSES):
p = Process(target=analyse_hands, args=((handsperprocess, q)))
# Allow q to get a few items
while not q.empty():
while not q.empty():
#code remove as not relevant to the problem
# Allow q to be refreshed so allowing all processes to finish before
# doing a join. It seems that doing a join before a process is
# finished will cause the program to lock
counter['empty'] += 1
for p in processlist:
while not q.empty():
# This is never executed as all the processes have finished and q
# emptied before the join command above.
#code remove as not relevant to the problem
finish_time = time.time()
I have no answer to the reason why IDLE will not run a multiprocessor start instruction correctly but I believe the answer to the doubling of the execution times lies in the type of problem I am dealing with. Perhaps others can comment but it seems to me that the overhead involved with adding and removing items to and from the Queue is quite high so that performance improvements will be best achieved when the amount of data being passed via the Queue is small compared with the amount of processing required to obtain that data.
In my program I am creating and passing 10**7 items of data and I suppose it is the overhead of passing this number of items via the Queue that kills any performance improvement from getting the data via separate Processes. By using a map it seems all 10^7 items of data will need to be stored in the map before any further processing can be done. This might improve performance depending on the overhead of using the map and dealing with that amount of data but for the time being I will stick with my original vanilla, single processed code.

Python Async Functionality

I'm trying to figure out how the async functionality works in Python. I have watched countless videos but I guess I'm not 'getting it'. My code looks as follows:
def run_watchers():
loop = asyncio.new_event_loop()
async def watcher_helper():
watchers = Watcher.objects.all()
for watcher in watchers:
print("Running watcher : " + str(
await watcher_helper2(watcher)
async def watcher_helper2(watcher):
for i in range(1,1000000):
x = i * 1000 / 2000
What makes sense to me is to have three functions. One to start the loop, second to iterate through the different options to execute and third to do the work.
I am expecting the following output:
Running watcher : 1
Running watcher : 2
Calculation done
Calculation done
however I am getting:
Running watcher : 1
Calculation done
Running watcher : 2
Calculation done
which obviously shows the calculations are not done in parallel. Any idea what I am doing wrong?
asyncio can be used only to speedup multiple network I/O related functions (send/receive data through internet). While you wait some data from network (which may take long time) you usually idle. Using asyncio allows you to use this idle time for another useful job: for example, to start another parallel network request.
asyncio can't somehow speedup CPU-related job (which is what watcher_helper2 do in your example). While you multiply some numbers there's simply no idle time which can be used to do something different and to achieve benefit through that.
Read also this answer for more detailed explanation.

Python 3: create new process when another one finishes

I have an array of data to handle and handler that executing long (1-2 minutes) and takes a lot of memory for its calculations.
raw = ['a', 'b', 'c']
def handler():
# do something long
Since handler requires a lot of memory, I want to execute it in separate subprocess and kill it after execution to release memory. Something like the following snippet:
from multiprocessing import Process
for r in raw:
process = Process(target=handler, args=(r))
The problem is that such approach leads to immediate running len(raw) processes. And it's not good.
Also, it's not needed to interchange any kind of data between subprocesses. Just run them consequently.
Therefore it would be great to run a few processes at the same time and add a new one once existing finishes.
How could it be implemented (if it's even possible)?
to run your processes sequentially, just join each process within the loop:
from multiprocessing import Process
for r in raw:
process = Process(target=handler, args=(r))
that way you're sure that only one process is running at the same time (no concurrency)
That's the simplest way. To run more than one process but limit the number of processes running at the same time, you can use a multiprocessing.Pool object and apply_async
I've built a simple example which computes the square of the argument, and simulates an heavy processing:
from multiprocessing import Pool
import time
def target(r):
raw = [1,2,3,4,5]
if __name__ == '__main__':
with Pool(3) as p: # 3 processes at a time
reslist = [p.apply_async(target, (r,)) for r in raw]
for result in reslist:
Running this I get:
<5 seconds wait, time to compute the results>
<5 seconds wait, 3 processes max can run at the same time>

How can I make a generator prepare the next value in advance?

I have a generator that loops through a large list of elements and yields the ones that meet certain conditions. It can take a while to process a single element. Once I yield that element, once again it takes a while to process it in my main function.
This means that when I loop through the generator, I have to wait for the generator to find an element that meets all the conditions, then for my main function to process it, then rinse and repeat. I'd like to speed things up by having the next value available as soon as I need it.
def generate(a, b):
for stack in some_function(a, b):
# Check for multiple conditions. This
# takes a while.
# I'd like to run this code in the
# background while I process the
# previous element down below.
yield stack
for stack in generate(foo, bar):
# Process the stack. This can take
# a while too.
How can I get the generator to prepare the next value so that it's ready when next is called? Is this possible out of the box? I looked into coroutines and concurrency already, but they don't seem relevant to my problem.
This is the solution I came up with:
from queue import Queue
from threading import Thread
def generate(a, b, queue):
for stack in some_function(a, b):
# Check for multiple conditions.
queue = Queue()
thread = Thread(target=generate, args=(foo, bar, queue))
while thread.is_alive() or not queue.empty():
stack = queue.get()
# Process the stack.
If stacks are processed faster than they're added to the queue, the while loop still runs because the thread is still alive. If the thread is dead, then the loop runs as long as the queue is empty. This is obviously a workaround because generate is no longer a generator, but it does the trick.

Should I use coroutines or another scheduling object here?

I currently have code in the form of a generator which calls an IO-bound task. The generator actually calls sub-generators as well, so a more general solution would be appreciated.
Something like the following:
def processed_values(list_of_io_tasks):
for task in list_of_io_tasks:
value = slow_io_call(task)
yield postprocess(value) # in real version, would iterate over
# processed_values2(value) here
I have complete control over slow_io_call, and I don't care in which order I get the items from processed_values. Is there something like coroutines I can use to get the yielded results in the fastest order by turning slow_io_call into an asynchronous function and using whichever call returns fastest? I expect list_of_io_tasks to be at least thousands of entries long. I've never done any parallel work other than with explicit threading, and in particular I've never used the various forms of lightweight threading which are available.
I need to use the standard CPython implementation, and I'm running on Linux.
Sounds like you are in search of multiprocessing.Pool(), specifically the Pool.imap_unordered() method.
Here is a port of your function to use imap_unordered() to parallelize calls to slow_io_call().
def processed_values(list_of_io_tasks):
pool = multiprocessing.Pool(4) # num workers
results = pool.imap_unordered(slow_io_call, list_of_io_tasks)
while True:
yield # large time-out
Note that you could also iterate over results directly (i.e. for item in results: yield item) without a while True loop, however calling with a time-out value works around this multiprocessing keyboard interrupt bug and allows you to kill the main process and all subprocesses with Ctrl-C. Also note that the StopIteration exceptions are not caught in this function but one will be raised when has no more items return. This is legal from generator functions, such as this one, which are expected to either raise StopIteration errors when there are no more values to yield or just stop yielding and a StopIteration exception will be raised on it's behalf.
To use threads in place of processes, replace
import multiprocessing
import multiprocessing.dummy as multiprocessing
