Python - nested loops and asyncio - python-3.x

I am still quite new to asyncio and struggling a bit with how to deal with loops within loops:
import asyncio
import concurrent.futures
import logging
import sys
import time
sub_dict = {
1: ['one', 'commodore', 'apple', 'linux', 'windows'],
2: ['two', 'commodore', 'apple', 'linux', 'windows'],
3: ['three', 'commodore', 'apple', 'linux', 'windows'],
4: ['four', 'commodore', 'apple', 'linux', 'windows'],
5: ['five', 'commodore', 'apple', 'linux', 'windows'],
6: ['six', 'commodore', 'apple', 'linux', 'windows'],
7: ['seven', 'commodore', 'apple', 'linux', 'windows'],
8: ['eight', 'commodore', 'apple', 'linux', 'windows']
}
def blocks(key, value):
for v in value:
log = logging.getLogger('blocks({} {})'.format(key, v))
log.info('running')
log.info('done')
time.sleep(5)
return key, v
async def run_blocking_tasks(executor, sub_dict2):
log = logging.getLogger('run_blocking_tasks')
log.info('starting')
log.info('creating executor tasks')
loop = asyncio.get_event_loop()
blocking_tasks = [
loop.run_in_executor(executor, blocks, key, value)
for key, value in sub_dict2.items()
]
log.info('waiting for executor tasks')
completed, pending = await asyncio.wait(blocking_tasks)
results = [t.result() for t in completed]
log.info('results: {!r}'.format(results))
log.info('exiting')
def new_func():
logging.basicConfig(
level=logging.INFO,
format='%(threadName)10s %(name)18s: %(message)s',
stream=sys.stderr,
)
executor = concurrent.futures.ThreadPoolExecutor(
max_workers=8,
)
event_loop = asyncio.get_event_loop()
event_loop.run_until_complete(
run_blocking_tasks(executor, sub_dict)
)
event_loop.close()
new_func()
Here you can see all the value items for each element are assigned to the same thread. For example all values of element '1' are on thread zero.
I know enough to understand that this is because my for v in value loop is not plugged into asyncio properly.
My desired output is, if I assigned five workers, each value item for the element '1' would be on it's own thread, numbered 0-4, giving five threads in total. This would then repeat for elements 2 through 8.
Should I assign 40 threads, 8 dictionary elements * 5 value items per element = 1 unique thread for each dictionary item.
Hope that makes sense...

Something about asking a question on SO always seems to trigger extra layers of IQ in me. The answer is as so, should anyone be interested:
def blocks(key, v):
#for v in value:
log = logging.getLogger('blocks({} {})'.format(key,v))
log.info('running')
log.info('done')
time.sleep(30)
return v
async def run_blocking_tasks(executor, sub_dict2):
log = logging.getLogger('run_blocking_tasks')
log.info('starting')
log.info('creating executor tasks')
for key, value in sub_dict2.items():
loop = asyncio.get_event_loop()
blocking_tasks = [
loop.run_in_executor(executor, blocks, key, v)
for v in value
]
log.info('waiting for executor tasks')
completed, pending = await asyncio.wait(blocking_tasks)
results = [t.result() for t in completed]
log.info('results: {!r}'.format(results))
log.info('exiting')
def new_func():
logging.basicConfig(
level=logging.INFO,
format='%(threadName)10s %(name)18s: %(message)s',
stream=sys.stderr,
)
sub_dict2 = dict(list(sub_dict.items())[0:8])
executor = concurrent.futures.ThreadPoolExecutor(
max_workers=5,
)
event_loop = asyncio.get_event_loop()
event_loop.run_until_complete(
run_blocking_tasks(executor, sub_dict2)
)
event_loop.close()
new_func()
EDIT:
Here is a rough layout of my present concurrent.futures code as per comment thread below. Is a layout of the logical order, rather than full code, as this has a few functions of pre steps that are quite lengthy...
#some code here that chunks a bigger dictionary using slicing in a for loop.
#sub_dict a 20 element subset of a bigger dictionary
#slices are parameterised in real code
sub_dict = dict(list(fin_dict.items())[0:20])
#set 20 workers
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
#submit to the executor an enumerator/counter and my sub dict
future_to_pair = {executor.submit(function_name, v, i): (i, v) for i, v in enumerate(sub_dict.items(), 1)}
#await results
for future in concurrent.futures.as_completed(future_to_pair):
pair = future_to_pair[future]
data = future.result()
#function that is being called by concurrent.futures
#am happy for all the v's in value to be on a single thread
#i want each key to be on an individual thread
#this will process 20 keys simultaneously, but wait for the slowest one before clearing
def function_name(sub_dict, i):
for key, value in sub_dict:
for v in value:
# using subprocess, execute some stuff
# dictionary loops provide parameters for the executables.

I think you have missed a key concept: waiting. An async def function without an await is perfectly legal but rather pointless. The purpose of asyncio programming is to handle situations where your program must wait for something, and the program has something useful it could be doing in the meantime. Otherwise it has very little utility.* It's also not easy to find simple examples that illustrate how useful this is.
Python offers several types of concurrency: processes, which use multiple CPU cores; threads, which use multiple lines of execution within a single process; and asyncio tasks, which use multiple units of execution within a thread. These can be combined in various ways, and have different characteristics.
Threads allow your program to block in one place while waiting for a resource, but continue to execute in another place. But synchronizing between threads is often tricky because scheduling CPU time between threads is pre-emptive. It's not under your direct control.
Tasks also allow your program to stop in one place and continue in another, but switching between tasks is cooperative. It's under your control. When a task encounters an "await" expression, it stops there and allows another task to run. That task continues until it bumps into an await expression, and so on. If this solves a problem you have, great. It's a fantastic tool.
It seems, based on reading a number of SO questions, that programmers sometimes get the impression that asyncio will make their programs run faster by sending them off to some sort of Never-Never Land where they execute without consuming any CPU cycles, and the result will come floating back on the breeze. Sorry, that won't happen. The main use case is what I described: you have to wait for something but you've got something else to do.
*Remark added for completeness: I have used the cross-threading capabilities of asyncio as means of coordinating between threads. For example, create an event loop in Thread B, and cause it to execute a function on demand by Thread A using the "call_soon_threadsafe" method, or the "run_coroutine_threadsafe" method. This is a handy capability even if it doesn't require the use of an await expression.

Related

Best way to keep creating threads on variable list argument

I have an event that I am listening to every minute that returns a list ; it could be empty, 1 element, or more. And with those elements in that list, I'd like to run a function that would monitor an event on that element every minute for 10 minute.
For that I wrote that script
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
import Client
client = Client()
def handle_event(event):
for i in range(10):
client.get_info(event)
sleep(60)
async def main():
while True:
entires = client.get_new_entry()
if len(entires) > 0:
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
executor.map(handle_event, entires)
await asyncio.sleep(60)
if __name__ == "__main__":
loop = asyncio.new_event_loop()
loop.run_until_complete(main())
However, instead of keep monitoring the entries, it blocks while the previous entries are still being monitors.
Any idea how I could do that please?
First let me explain why your program doesn't work the way you want it to: It's because you use the ThreadPoolExecutor as a context manager, which will not close until all the threads started by the call to map are finished. So main() waits there, and the next iteration of the loop can't happen until all the work is finished.
There are ways around this. Since you are using asyncio already, one approach is to move the creation of the Executor to a separate task. Each iteration of the main loop starts one copy of this task, which runs as long as it takes to finish. It's a async def function so many copies of this task can run concurrently.
I changed a few things in your code. Instead of Client I just used some simple print statements. I pass a list of integers, of random length, to handle_event. I increment a counter each time through the while True: loop, and add 10 times the counter to every integer in the list. This makes it easy to see how old calls continue for a time, mixing with new calls. I also shortened your time delays. All of these changes were for convenience and are not important.
The important change is to move ThreadPoolExecutor creation into a task. To make it cooperate with other tasks, it must contain an await expression, and for that reason I use executor.submit rather than executor.map. submit returns a concurrent.futures.Future, which provides a convenient way to await the completion of all the calls. executor.map, on the other hand, returns an iterator; I couldn't think of any good way to convert it to an awaitable object.
To convert a concurrent.futures.Future to an asyncio.Future, an awaitable, there is a function asyncio.wrap_future. When all the futures are complete, I exit from the ThreadPoolExecutor context manager. That will be very fast since all of the Executor's work is finished, so it does not block other tasks.
import random
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
def handle_event(event):
for i in range(10):
print("Still here", event)
sleep(2)
async def process_entires(counter, entires):
print("Counter", counter, "Entires", entires)
x = [counter * 10 + a for a in entires]
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
futs = []
for z in x:
futs.append(executor.submit(handle_event, z))
await asyncio.gather(*(asyncio.wrap_future(f) for f in futs))
async def main():
counter = 0
while True:
entires = [0, 1, 2, 3, 4][:random.randrange(5)]
if len(entires) > 0:
counter += 1
asyncio.create_task(process_entires(counter, entires))
await asyncio.sleep(3)
if __name__ == "__main__":
asyncio.run(main())

How can i use multithreading (or multiproccessing?) for faster data upload?

I have a list of issues (jira issues):
listOfKeys = [id1,id2,id3,id4,id5...id30000]
I want to get worklogs of this issues, for this I used jira-python library and this code:
listOfWorklogs=pd.DataFrame() (I used pandas (pd) lib)
lst={} #dictionary for help, where the worklogs will be stored
for i in range(len(listOfKeys)):
worklogs=jira.worklogs(listOfKeys[i]) #getting list of worklogs
if(len(worklogs)) == 0:
i+=1
else:
for j in range(len(worklogs)):
lst = {
'self': worklogs[j].self,
'author': worklogs[j].author,
'started': worklogs[j].started,
'created': worklogs[j].created,
'updated': worklogs[j].updated,
'timespent': worklogs[j].timeSpentSeconds
}
listOfWorklogs = listOfWorklogs.append(lst, ignore_index=True)
########### Below there is the recording to the .xlsx file ################
so I simply go into the worklog of each issue in a simple loop, which is equivalent to referring to the link:
https://jira.mycompany.com/rest/api/2/issue/issueid/worklogs and retrieving information from this link
The problem is that there are more than 30,000 such issues.
and the loop is sooo slow (approximately 3 sec for 1 issue)
Can I somehow start multiple loops / processes / threads in parallel to speed up the process of getting worklogs (maybe without jira-python library)?
I recycled a piece of code I made into your code, I hope it helps:
from multiprocessing import Manager, Process, cpu_count
def insert_into_list(worklog, queue):
lst = {
'self': worklog.self,
'author': worklog.author,
'started': worklog.started,
'created': worklog.created,
'updated': worklog.updated,
'timespent': worklog.timeSpentSeconds
}
queue.put(lst)
return
# Number of cpus in the pc
num_cpus = cpu_count()
index = 0
# Manager and queue to hold the results
manager = Manager()
# The queue has controlled insertion, so processes don't step on each other
queue = manager.Queue()
listOfWorklogs=pd.DataFrame()
lst={}
for i in range(len(listOfKeys)):
worklogs=jira.worklogs(listOfKeys[i]) #getting list of worklogs
if(len(worklogs)) == 0:
i+=1
else:
# This loop replaces your "for j in range(len(worklogs))" loop
while index < len(worklogs):
processes = []
elements = min(num_cpus, len(worklogs) - index)
# Create a process for each cpu
for i in range(elements):
process = Process(target=insert_into_list, args=(worklogs[i+index], queue))
processes.append(process)
# Run the processes
for i in range(elements):
processes[i].start()
# Wait for them to finish
for i in range(elements):
processes[i].join(timeout=10)
index += num_cpus
# Dump the queue into the dataframe
while queue.qsize() != 0:
listOfWorklogs.append(q.get(), ignore_index=True)
This should work and reduce the time by a factor of little less than the number of CPUs in your machine. You can try and change that number manually for better performance. In any case I find it very strange that it takes about 3 seconds per operation.
PS: I couldn't try the code because I have no examples, it probably has some bugs
I have some troubles((
1) indents in the code where the first "for" loop appears and the first "if" instruction begins (this instruction and everything below should be included in the loop, right?)
for i in range(len(listOfKeys)-99):
worklogs=jira.worklogs(listOfKeys[i]) #getting list of worklogs
if(len(worklogs)) == 0:
....
2) cmd, conda prompt and Spyder did not allow your code to work for a reason:
Python Multiprocessing error: AttributeError: module '__ main__' has no attribute 'spec'
After researching in the google, I had to set a bit higher in the code: spec = None (but I'm not sure if this is correct) and this error disappeared.
By the way, the code in Jupyter Notebook worked without this error, but listOfWorklogs is empty and this is not right.
3) when I corrected indents and set __spec __ = None, a new error occurred in this place:
processes[i].start ()
error like this:
"PicklingError: Can't pickle : attribute lookup PropertyHolder on jira.resources failed"
if I remove the parentheses from the start and join methods, the code will work, but I will not have any entries in the listOfWorklogs(((
I ask again for your help!)
How about thinking about it not from a technical standpoint but a logical one? You know your code works, but at a rate of 3sec per 1 issue which means it would take 25 hours to complete. If you have the ability to split up the # of Jira issues that are passed into the script (maybe use date or issue key, etc) you could create multiple different .py files with basically the same code, you would just be passing each one a different list of Jira tickets. So you could just run say 4 of them at the same time and you would reduce your time to 6.25 hours each.

How to reuse a multiprocessing pool?

At the bottom is the code I have now. It seems to work fine. However, I don't completely understand it. I thought without .join(), I'd risking the code going onto the next for-loop before the pool finishes executing. Wouldn't we need those 3 commented-out lines?
On the other hand, if I were to go with the .close() and .join() way, is there any way to 'reopen' that closed pool instead of Pool(6) every time?
import multiprocessing as mp
import random as rdm
from statistics import stdev, mean
import time
def mesh_subset(population, n_chosen=5):
chosen = rdm.choices(population, k=n_chosen)
return mean(chosen)
if __name__ == '__main__':
population = [x for x in range(20)]
N_iteration = 10
start_time = time.time()
pool = mp.Pool(6)
for i in range(N_iteration):
print([round(x,2) for x in population])
print(stdev(population))
# pool = mp.Pool(6)
population = pool.map(mesh_subset, [population]*len(population))
# pool.close()
# pool.join()
print('run time:', time.time() - start_time)
A pool of workers is a relatively costly thing to set up, so it should be done (if possible) only once, usually at the beginning of the script.
The pool.map command blocks until all the tasks are completed. After all, it returns a list of the results. It couldn't do that unless mesh_subset has been called on all the inputs and has returned a result for each. In contrast, methods like pool.apply_async do not block. apply_async returns an ApplyResult object with a get method which blocks until it obtains a result from a worker process.
pool.close sets the worker handler's state to CLOSE. This causes the handler to signal the workers to terminate.
The pool.join blocks until all the worker processes have been terminated.
So you don't need to call -- in fact you shouldn't call -- pool.close and pool.join until you are finished with the pool. Once the workers have been sent the signal to terminate (by pool.close), there is no way to "reopen" them. You would need to start a new pool instead.
In your situation, since you do want the loop to wait until all the tasks are completed, there would be no advantage to using pool.apply_async instead of pool.map. But if you were to use pool.apply_async, you could obtain the same result as before by calling get instead of resorting to closing and restarting the pool:
# you could do this, but using pool.map is simpler
for i in range(N_iteration):
apply_results = [pool.apply_async(mesh_subset, [population]) for i in range(len(population))]
# the call to result.get() blocks until its worker process (running
# mesh_subset) returns a value
population = [result.get() for result in apply_results]
When the loops complete, len(population) is unchanged.
If you did NOT want each loop to block until all the tasks are completed, you could use apply_async's callback feature:
N_pop = len(population)
result = []
for i in range(N_iteration):
for i in range(N_pop):
pool.apply_async(mesh_subset, [population]),
callback=result.append)
pool.close()
pool.join()
print(result)
Now, when any mesh_subset returns a return_value,
result.append(return_value) is called. The calls to apply_async do not
block, so N_iteration * N_pop tasks are pushed into the pools task
queue all at once. But since the pool has 6 workers, at most 6 calls to
mesh_subset are running at any given time. As the workers complete the tasks,
whichever worker finishes first calls result.append(return_value). So the
values in result are unordered. This is different than pool.map which
returns a list whose return values are in the same order as its corresponding
list of arguments.
Barring an exception, result will eventually contain N_iteration * N_pop return values once
all the tasks complete. Above, pool.close() and pool.join() were used to
wait for all the tasks to complete.

Where to set the locks in apply pandas with multithreading?

I am trying to asynchronously read and write from a pandas df with an apply function. For this purpose I am using the multithreading.dummy package. Since I am doing read and write simultaneously (multithreaded) on my df, I am using multiprocessing.Lock() so that no more than one thread can edit the df at the a given time. However I am a bit confused to where I should be adding a lock.acquire() and lock.release()with an apply function in pandas. I have tried doing as per below, however, it seems that doing as so the entire process becomes synchronous, so it defeats the whole purpose of multithreading.
self._lock.acquire()
to_df[col_name] = to_df.apply(lambda row: getattr(Object(row['col_1'],
row['col_2'],
row['col_3']),
someattribute), axis=1)
self._lock.release()
Note: In my case I have to be doing getattr. someattribute is simply a #property in Object. Object takes 3 arguments, which some from rows 1,2,3 from my df.
There 2 possible solutions. 1 - locks. 2 - queues. Code below is just a skeleton, it may contain typos/errors and cannot be used as is.
First. Locks where they actually needed:
def method_to_process_url(df):
lock.acquire()
url = df.loc[some_idx, some_col]
lock.release()
info = process_url(url)
lock.acquire()
# add info to df
lock.release()
Second. Queues instead of locks:
def method_to_process_url(df, url_queue, info_queue):
for url in url_queue.get():
info = process_url(url)
info_queue.put(info)
url_queue = queue.Queue()
# add all urls to process to the url_queue
info_queue = queue.Queue()
# working_thread_1
threading.Thread(
target=method_to_process_url,
kwargs={'url_queue': url_queue, 'info_queue': info_queue},
daemon=True).start()
# more working threads
counter = 0
while counter < amount_of_urls:
info = info_queue.get():
# add info to df
counter += 1
In the second case you may even start separate thread for every url without url_queue (reasonable if amount of urls is on the order of thousands or less). counter is some simple way to stop the program when all urls are processed.
I would use the second approach if you ask me. It is more flexible in my opinion.

Python Multiprocessing Queue Slow

I have a problem with python multiprocessing Queues.
I'm doing some hard computation on some data. I have created few processes to lower calculation time, also data have been split evenly before sending it to processes. It decrease the time of calculations nicely but when I want to return data from the process by multiprocessing.Queue it takes ages and whole thing is slower than calculating in main thread.
processes = []
proc = 8
for i in range(proc):
processes.append(multiprocessing.Process(target=self.calculateTriangles, args=(inData[i],outData,timer)))
for p in processes:
p.start()
results = []
for i in range(proc):
results.append(outData.get())
print("killing threads")
print(datetime.datetime.now() - timer)
for p in processes:
p.join()
print("Finish Threads")
print(datetime.datetime.now() - timer)
all of threads print their finish time when they are done. Here is example output of this code
0:00:00.017873 CalcDone
0:00:01.692940 CalcDone
0:00:01.777674 CalcDone
0:00:01.780019 CalcDone
0:00:01.796739 CalcDone
0:00:01.831723 CalcDone
0:00:01.842356 CalcDone
0:00:01.868633 CalcDone
0:00:05.497160 killing threads
60968 calculated triangles
As you can see everything is quiet simple until this code.
for i in range(proc):
results.append(outData.get())
print("killing threads")
print(datetime.datetime.now() - timer)
here are some observations I have made on mine computer and slower one.
https://docs.google.com/spreadsheets/d/1_8LovX0eSgvNW63-xh8L9-uylAVlzY4VSPUQ1yP2F9A/edit?usp=sharing . On slower one there isn't any improvement as you can see.
Why does it take so much time to get items from queue when process is finished?? Is there way to speed this up?
So I have solved it myself. Calculations are fast but copying objects from one process to another takes ages. I just made a method that cleared all not-necessary fields in the objects, also using pipes is faster than multiprocessing queues. It took down the time on my slower computer from 29 seconds to 15 seconds.
This time is mainly spent on putting another object to the Queue and spiking up the Semaphore count. If you are able to bulk insert the Queue with all the data at once, then you cut down to 1/10 of the previous time.
I've assigned dynamically a new method to Queue based on the old one. Go to the multiprocessing module for your Python version:
/usr/lib/pythonx.x/multiprocessing.queues.py
Copy the "put" method of the class to your project e.g. for Python 3.7:
def put(self, obj, block=True, timeout=None):
assert not self._closed, "Queue {0!r} has been closed".format(self)
if not self._sem.acquire(block, timeout):
raise Full
with self._notempty:
if self._thread is None:
self._start_thread()
self._buffer.append(obj)
self._notempty.notify()
modify it:
def put_bla(self, obj, block=True, timeout=None):
assert not self._closed, "Queue {0!r} has been closed".format(self)
for el in obj:
if not self._sem.acquire(block, timeout): #spike the semaphore count
raise Full
with self._notempty:
if self._thread is None:
self._start_thread()
self._buffer += el # adding a collections.deque object
self._notempty.notify()
The last step is to add the new method to the class. The multiprocessing.Queue is a DefaultContext method which returns a Queue object. It is easier to inject the method directly to the class of the created object. So:
from collections import deque
queue = Queue()
queue.__class__.put_bulk = put_bla # injecting new method
items = (500, 400, 450, 350) * count # (500, 400, 450, 350, 500, 400...)
queue.put_bulk(deque(items))
Unfortunately the multiprocessing.Pool was always faster by 10%, so just stick with that if you don't require everlasting workers to process your tasks. It is based on multiprocessing.SimpleQueue which is based on multiprocessing.Pipe and I have no idea why it is faster because my SimpleQueue solution wasn't and it is not bulk-injectable:) Break that and You'll have the fastest worker ever:)

Resources