Why no call_at_threadsafe and call_later_threadsafe? - python-3.x

I'm using Python 3.5.2 in Windows 32bits and aware that asyncio call_at is not threadsafe, hence following code won't print 'bomb' unless I uncomment the line loop._write_to_self().
import asyncio
import threading
def bomb(loop):
loop.call_later(1, print, 'bomb')
print('submitted')
# loop._write_to_self()
if __name__ == '__main__':
loop = asyncio.get_event_loop()
threading.Timer(2, bomb, args=(loop,)).start()
loop.run_forever()
However I couldn't find a piece of information about why call_at_threadsafe and call_later_threadsafe is implemented. Is the reason ever exists?

Simply use loop.call_soon_threadsafe to schedule loop.call_later:
loop.call_soon_threadsafe(loop.call_later, 1, print, 'bomb')

Related

torch.distributed.barrier() added on all processes not working

import torch
import os
torch.distributed.init_process_group(backend="nccl")
local_rank = int(os.environ["LOCAL_RANK"])
if local_rank >0:
torch.distributed.barrier()
print(f"Entered process {local_rank}")
if local_rank ==0:
torch.distributed.barrier()
The above code gets hanged forever but if I remove both torch.distributed.barrier() then both print statements get executed. Am I missing something here?
On the command line I execute the process using torchrun --nnodes=1 --nproc_per_node 2 test.py where test.py is the name of the script
tried the above code with and without the torch.distributed.barrier()
With the barrier() statements expecting the statement to print for one gpu and exit -- not as expected
Without the barrier() statements expecting both to print -- as expected
Am I missing something here?
It is better to put your multiprocessing initialization code inside the if __name__ == "__main__": to avoid endless process generation and re-design the control flow to fit your purpose:
if __name__ == "__main__":
import torch
import os
torch.distributed.init_process_group(backend="nccl")
local_rank = int(os.environ["LOCAL_RANK"])
if local_rank > 0:
torch.distributed.barrier()
else:
print(f"Entered process {local_rank}")
torch.distributed.barrier()

How to confirm multiprocessing library is being used?

I am trying to use multiprocessing for the below code. The code seems to run a bit faster than the for loop inside the function.
How can I confirm I using the library and not the just the for loop?
from multiprocessing import Pool
from multiprocessing import cpu_count
import requests
import pandas as pd
data= pd.read_csv('~/Downloads/50kNAE000.txt.1' ,sep="\t", header=None)
data = data[0].str.strip("0 ")
lst = []
def request(x):
for i,v in x.items():
print(i)
file = requests.get(v)
lst.append(file.text)
#time.sleep(1)
if __name__ == "__main__":
pool = Pool(cpu_count())
results = pool.map(request(data))
pool.close() # 'TERM'
pool.join() # 'KILL'
Multiprocessing has overhead. It has to start the process and transfer function data via interprocess mechanism. Just running a single function in another process vs. running that same function normally is always going to be slower. The advantage is actually doing parallelism with significant work in the functions that makes the overhead minimal.
You can call multiprocessing.current_process().name to see the process name change.

How to transfer data between two separate scripts in Multiprocessing?

I am using multiprocessing to run two python scripts in parallel. p1.y continually updates a certain variable and the latest value of the variable will be displayed by p2.py after every 2seconds. The code for multiprocessing of the two scripts are given below:
import os
from multiprocessing import Process
def script1():
os.system("p1.py")
def script2():
os.system("p2.py")
if __name__ == '__main__':
p = Process(target=script1)
q = Process(target=script2)
p.start()
q.start()
p.join()
q.join()
I am unable to transfer the value of the variable being updated by p1.py to p2.py. How should I approach the problem in a very simple way?

Why python asyncio process in a thread seems unstable on Linux?

I try to run a python3 asynchronous external command from a Qt Application. Before I was using a multiprocessing thread to do it without freezing the Qt Application. But now, I would like to do it with a QThread to be able to pickle and give a QtWindows as argument for some other functions (not presented here). I did it and test it with success on my Windows OS, but I tried the application on my Linux OS, I get the following error :RuntimeError: Cannot add child handler, the child watcher does not have a loop attached
From that point I tried to isolate the problem, and I obtain the minimal (as possible as I could) example below that replicates the problem.
Of course, as I mentioned before, if I replace QThreadPool by a list of multiprocessing.thread this example is working well. I also realized something that astonished me: if I uncomment the line rc = subp([sys.executable,"./HelloWorld.py"]) in the last part of the example, it works also. I couldn't explain myself why.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
## IMPORTS ##
from functools import partial
from PyQt5 import QtCore
from PyQt5.QtCore import QThreadPool, QRunnable, QCoreApplication
import sys
import asyncio.subprocess
# Global variables
Qpool = QtCore.QThreadPool()
def subp(cmd_list):
""" """
if sys.platform.startswith('linux'):
new_loop = asyncio.new_event_loop()
asyncio.set_event_loop(new_loop)
elif sys.platform.startswith('win'):
new_loop = asyncio.ProactorEventLoop() # for subprocess' pipes on Windows
asyncio.set_event_loop(new_loop)
else :
print('[ERROR] OS not available for encodage... EXIT')
sys.exit(2)
rc, stdout, stderr= new_loop.run_until_complete(get_subp(cmd_list) )
new_loop.close()
if rc!=0 :
print('Exit not zero ({}): {}'.format(rc, sys.exc_info()[0]) )#, exc_info=True)
return rc, stdout, stderr
async def get_subp(cmd_list):
""" """
print('subp: '+' '.join(cmd_list) )
# Create the subprocess, redirect the standard output into a pipe
create = asyncio.create_subprocess_exec(*cmd_list, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE) #
proc = await create
# read child's stdout/stderr concurrently (capture and display)
try:
stdout, stderr = await asyncio.gather(
read_stream_and_display(proc.stdout),
read_stream_and_display(proc.stderr))
except Exception:
proc.kill()
raise
finally:
rc = await proc.wait()
print(" [Exit {}] ".format(rc)+' '.join(cmd_list))
return rc, stdout, stderr
async def read_stream_and_display(stream):
""" """
async for line in stream:
print(line, flush=True)
class Qrun_from_job(QtCore.QRunnable):
def __init__(self, job, arg):
super(Qrun_from_job, self).__init__()
self.job=job
self.arg=arg
def run(self):
code = partial(self.job)
code()
def ThdSomething(job,arg):
testRunnable = Qrun_from_job(job,arg)
Qpool.start(testRunnable)
def testThatThing():
rc = subp([sys.executable,"./HelloWorld.py"])
if __name__=='__main__':
app = QCoreApplication([])
# rc = subp([sys.executable,"./HelloWorld.py"])
ThdSomething(testThatThing,'tests')
sys.exit(app.exec_())
with the HelloWorld.py file:
#!/usr/bin/env python3
import sys
if __name__=='__main__':
print('HelloWorld')
sys.exit(0)
Therefore I have two questions: How to make this example working properly with QThread ? And why a previous call of an asynchronous task (with a call of subp function) change the stability of the example on Linux ?
EDIT
Following advices of #user4815162342, I tried with a run_coroutine_threadsafe with the code below. But it is not working and returns the same error ie RuntimeError: Cannot add child handler, the child watcher does not have a loop attached. I also tried to change the threading command by its equivalent in the module mutliprocessing ; and with the last one, the command subp is never launched.
The code :
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
## IMPORTS ##
import sys
import asyncio.subprocess
import threading
import multiprocessing
# at top-level
loop = asyncio.new_event_loop()
def spin_loop():
asyncio.set_event_loop(loop)
loop.run_forever()
def subp(cmd_list):
# submit the task to asyncio
fut = asyncio.run_coroutine_threadsafe(get_subp(cmd_list), loop)
# wait for the task to finish
rc, stdout, stderr = fut.result()
return rc, stdout, stderr
async def get_subp(cmd_list):
""" """
print('subp: '+' '.join(cmd_list) )
# Create the subprocess, redirect the standard output into a pipe
proc = await asyncio.create_subprocess_exec(*cmd_list, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE) #
# read child's stdout/stderr concurrently (capture and display)
try:
stdout, stderr = await asyncio.gather(
read_stream_and_display(proc.stdout),
read_stream_and_display(proc.stderr))
except Exception:
proc.kill()
raise
finally:
rc = await proc.wait()
print(" [Exit {}] ".format(rc)+' '.join(cmd_list))
return rc, stdout, stderr
async def read_stream_and_display(stream):
""" """
async for line in stream:
print(line, flush=True)
if __name__=='__main__':
threading.Thread(target=spin_loop, daemon=True).start()
# multiprocessing.Process(target=spin_loop, daemon=True).start()
print('thread passed')
rc = subp([sys.executable,"./HelloWorld.py"])
print('end')
sys.exit(0)
As a general design principle, it's unnecessary and wasteful to create new event loops only to run a single subroutine. Instead, create an event loop, run it in a separate thread, and use it for all your asyncio needs by submitting tasks to it using asyncio.run_coroutine_threadsafe.
For example:
# at top-level
loop = asyncio.new_event_loop()
def spin_loop():
asyncio.set_event_loop(loop)
loop.run_forever()
asyncio.get_child_watcher().attach_loop(loop)
threading.Thread(target=spin_loop, daemon=True).start()
# ... the rest of your code ...
With this in place, you can easily execute any asyncio code from any thread whatsoever using the following:
def subp(cmd_list):
# submit the task to asyncio
fut = asyncio.run_coroutine_threadsafe(get_subp(cmd_list), loop)
# wait for the task to finish
rc, stdout, stderr = fut.result()
return rc, stdout, stderr
Note that you can use add_done_callback to be notified when the future returned by asyncio.run_coroutine_threadsafe finishes, so you might not need a thread in the first place.
Note that all interaction with the event loop should go either through the afore-mentioned run_coroutine_threadsafe (when submitting coroutines) or through loop.call_soon_threadsafe when you need the event loop to call an ordinary function. For example, to stop the event loop, you would invoke loop.call_soon_threadsafe(loop.stop).
I suspect that what you are doing is simply unsupported - according to the documentation:
To handle signals and to execute subprocesses, the event loop must be run in the main thread.
As you are trying to execute a subprocess, I do not think running a new event loop in another thread works.
Thing is, Qt already has an event loop, and what you really need is to convince asyncio to use it. That means that you need an event loop implementation that provides the "event loop interface for asyncio" implemented on top of "Qt's event loop".
I believe that asyncqt provides such an implementation. You may want to try to use QEventLoop(app) in place of asyncio.new_event_loop().

Python multiprocessing script partial output

I am following the principles laid down in this post to safely output the results which will eventually be written to a file. Unfortunately, the code only print 1 and 2, and not 3 to 6.
import os
import argparse
import pandas as pd
import multiprocessing
from multiprocessing import Process, Queue
from time import sleep
def feed(queue, parlist):
for par in parlist:
queue.put(par)
print("Queue size", queue.qsize())
def calc(queueIn, queueOut):
while True:
try:
par=queueIn.get(block=False)
res=doCalculation(par)
queueOut.put((res))
queueIn.task_done()
except:
break
def doCalculation(par):
return par
def write(queue):
while True:
try:
par=queue.get(block=False)
print("response:",par)
except:
break
if __name__ == "__main__":
nthreads = 2
workerQueue = Queue()
writerQueue = Queue()
considerperiod=[1,2,3,4,5,6]
feedProc = Process(target=feed, args=(workerQueue, considerperiod))
calcProc = [Process(target=calc, args=(workerQueue, writerQueue)) for i in range(nthreads)]
writProc = Process(target=write, args=(writerQueue,))
feedProc.start()
feedProc.join()
for p in calcProc:
p.start()
for p in calcProc:
p.join()
writProc.start()
writProc.join()
On running the code it prints,
$ python3 tst.py
Queue size 6
response: 1
response: 2
Also, is it possible to ensure that the write function always outputs 1,2,3,4,5,6 i.e. in the same order in which the data is fed into the feed queue?
The error is somehow with the task_done() call. If you remove that one, then it works, don't ask me why (IMO that's a bug). But the way it works then is that the queueIn.get(block=False) call throws an exception because the queue is empty. This might be just enough for your use case, a better way though would be to use sentinels (as suggested in the multiprocessing docs, see last example). Here's a little rewrite so your program uses sentinels:
import os
import argparse
import multiprocessing
from multiprocessing import Process, Queue
from time import sleep
def feed(queue, parlist, nthreads):
for par in parlist:
queue.put(par)
for i in range(nthreads):
queue.put(None)
print("Queue size", queue.qsize())
def calc(queueIn, queueOut):
while True:
par=queueIn.get()
if par is None:
break
res=doCalculation(par)
queueOut.put((res))
def doCalculation(par):
return par
def write(queue):
while not queue.empty():
par=queue.get()
print("response:",par)
if __name__ == "__main__":
nthreads = 2
workerQueue = Queue()
writerQueue = Queue()
considerperiod=[1,2,3,4,5,6]
feedProc = Process(target=feed, args=(workerQueue, considerperiod, nthreads))
calcProc = [Process(target=calc, args=(workerQueue, writerQueue)) for i in range(nthreads)]
writProc = Process(target=write, args=(writerQueue,))
feedProc.start()
feedProc.join()
for p in calcProc:
p.start()
for p in calcProc:
p.join()
writProc.start()
writProc.join()
A few things to note:
the sentinel is putting a None into the queue. Note that you need one sentinel for every worker process.
for the write function you don't need to do the sentinel handling as there's only one process and you don't need to handle concurrency (if you would do the empty() and then get() thingie in your calc function you would run into a problem if e.g. there's only one item left in the queue and both workers check empty() at the same time and then both want to do get() and then one of them is locked forever)
you don't need to put feed and write into processes, just put them into your main function as you don't want to run it in parallel anyway.
how can I have the same order in output as in input? [...] I guess multiprocessing.map can do this
Yes map keeps the order. Rewriting your program into something simpler (as you don't need the workerQueue and writerQueue and adding random sleeps to prove that the output is still in order:
from multiprocessing import Pool
import time
import random
def calc(val):
time.sleep(random.random())
return val
if __name__ == "__main__":
considerperiod=[1,2,3,4,5,6]
with Pool(processes=2) as pool:
print(pool.map(calc, considerperiod))

Resources