Pool issue when called from parent pool Python

Pool issue when called from parent pool Python - python-3.x

test1.py/myfunc1() does some work in parallel.
If I call myfunc1() from test2.py - it works fine (currently commented out).
If I create another pool in test2.py and call myfunc1() from those I get an unreported error in test1.py on the "pool = mp.Pool(5)" line .
result = {type} <class 'AssertionError'> args = {getset_descriptor}
<attribute 'args' of 'BaseException' objects>
How do I fix this issue?
test1.py
import time
import multiprocessing as mp
def worker(a):
print("Worker: "+str(a))
time.sleep(5)
return a
def mycallback(val ):
print("Callback: "+str(val))
def myfunc1(n=3):
print("start myfunc1")
slist = range(n)
pool = mp.Pool(5)
[pool.apply_async(worker,args=(s,), callback=mycallback) for s in slist]
pool.close()
pool.join()
if __name__ == "__main__":
myfunc1()
test2.py
from pythonProjectTEST.test1 import myfunc1
import multiprocessing as mp
def mycallback(val ):
print("CallbackMaster: "+str(val))
if __name__ == "__main__":
# This works
#myfunc1(5)
# This does not
slist = range(6)
pool = mp.Pool(3)
[pool.apply_async(myfunc1,args=(s,), callback=mycallback) for s in slist]
pool.close()
pool.join()

You are not allowed to spawn a daemon process from another daemon process. Note how the main of test2 spawns processes to call myfunc1 and then myfunc1 spawns processes to call worker. I suspect this restriction is to reduce the chances of fork bombs or deadlocks. If you really want to do this, there are workarounds: Python Process Pool non-daemonic?. However, I would avoid it if possible.
To debug an issue like this, it is often convenient to add an error callback. For example, the following code gives you a helpful error message "Error: daemonic processes are not allowed to have children":
def errorcallback(val):
print("Error: %s" % str(val))
...
[pool.apply_async(myfunc1,args=(s,), callback=mycallback, error_callback=errorcallback ) for s in slist]
The method apply_async will normally eat errors unless you specify an error_calblack (see the documentation here). The arguments in square brackets are optional, but you can add them one by one with the specified names.
apply_async(func[, args[, kwds[, callback[, error_callback]]]])
"If error_callback is specified then it should be a callable which accepts a single argument. If the target function fails, then the error_callback is called with the exception instance."

Related

Python Asyncio and Multithreading

I have created a greatly simplified version of an application below that intends to use Python's asyncio and threading modules. The general structure is as follows:
import asyncio
import threading
class Node:
def __init__(self, loop):
self.loop = loop
self.tasks = set()
async def computation(self, x):
print("Node: computation called with input ", x)
await asyncio.sleep(1)
def schedule_computation(self, x):
print("Node: schedule_computation called with input ", x)
task = self.loop.create_task(self.computation(x))
self.tasks.add(task)
class Router:
def __init__(self, loop):
self.loop = loop
self.nodes = {}
def register_node(self, id):
self.nodes[id] = Node(self.loop)
def schedule_computation(self, node_id, x):
print("Router: schedule_computation called with input ", x)
self.nodes[node_id].schedule_computation(x)
class Client:
def __init__(self, router):
self.router = router
self.counter = 0
def run(self):
while True:
if self.counter == 1000000:
self.router.schedule_computation(1, 5)
self.counter += 1
def main():
loop = asyncio.get_event_loop()
# construct Router instance and register a node
router = Router(loop)
router.register_node(1)
# construct Client instance
client = Client(router)
client_thread = threading.Thread(target=client.run)
client_thread.start()
loop.run_forever()
main()
In practice the Node.computation method is doing some network I/O and thus I'd like to perform said work asynchronously. The Client.run method is synchronous and blocking and I'd like to give this function it's own thread to execute in (in fact I'd like the ability to run this method in a separate process if possible).
Upon executing this application we get the following output:
Router: schedule_computation called with input 5
Node: schedule_computation called with input 5
However, I expect that "Node: computation called with input 5" should print as well because the Node.schedule_computation method creates a task to run on loop. In summary, why does it seem that Node.computation is never scheduled?

Use loop.call_soon_threadsafe
In general, asyncio isn't thread safe
Almost all asyncio objects are not thread safe, which is typically not
a problem unless there is code that works with them from outside of a
Task or a callback. If there’s a need for such code to call a
low-level asyncio API, the loop.call_soon_threadsafe() method should
be used
https://docs.python.org/3/library/asyncio-dev.html#concurrency-and-multithreading
SCHEDULE COMPUTATION
loop.call_soon_threadsafe(self.nodes[node_id].schedule_computation,x)
Node.computation runs on main thread
Not sure if you are aware, but even though you can use call_soon_threadsafe to initiate a coroutine from another thread. The coroutine always runs in the thread the loop was created in. If you want to run coroutines on another thread, then your background thread will need its own EventLoop also.

Launching parallel tasks: Subprocess output triggers function asynchronously

The example I will describe here is purely conceptual so I'm not interested in solving this actual problem.
What I need to accomplish is to be able to asynchronously run a function based on a continuous output of a subprocess command, in this case, the windows ping yahoo.com -t command and based on the time value from the replies I want to trigger the startme function. Now inside this function, there will be some more processing done, including some database and/or network-related calls so basically I/O processing.
My best bet would be that I should use Threading but for some reason, I can't get this to work as intended. Here is what I have tried so far:
First of all I tried the old way of using Threads like this:
import subprocess
import re
import asyncio
import time
import threading
def startme(mytime: int):
print(f"Mytime {mytime} was started!")
time.sleep(mytime) ## including more long operation functions here such as database calls and even some time.sleep() - if possible
print(f"Mytime {mytime} finished!")
myproc = subprocess.Popen(['ping', 'yahoo.com', '-t'], shell=True, stdout=subprocess.PIPE)
def main():
while True:
output = myproc.stdout.readline()
if myproc.poll() is not None:
break
myoutput = output.strip().decode(encoding="UTF-8")
print(myoutput)
mytime = re.findall("(?<=time\=)(.*)(?=ms\s)", myoutput)
try:
mytime = int(mytime[0])
if mytime < 197:
# startme(int(mytime[0]))
p1 = threading.Thread(target=startme(mytime), daemon=True)
# p1 = threading.Thread(target=startme(mytime)) # tried with and without the daemon
p1.start()
# p1.join()
except:
pass
main()
But right after startme() fire for the first time, the pings stop showing and they are waiting for the startme.time.sleep() to finish.
I did manage to get this working using the concurrent.futures's ThreadPoolExecutor but when tried to replace the time.sleep() with the actual database query I found out that my startme() function will never complete so no Mytime xxx finished! message is ever shown nor any database entry is being made.
import sqlite3
import subprocess
import re
import asyncio
import time
# import threading
# import multiprocessing
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor
conn = sqlite3.connect('test.db')
c = conn.cursor()
c.execute(
'''CREATE TABLE IF NOT EXISTS mytable (id INTEGER PRIMARY KEY, u1, u2, u3, u4)''')
def startme(mytime: int):
print(f"Mytime {mytime} was started!")
# time.sleep(mytime) ## including more long operation functions here such as database calls and even some time.sleep() - if possible
c.execute("INSERT INTO mytable VALUES (null, ?, ?, ?, ?)",(1,2,3,mytime))
conn.commit()
print(f"Mytime {mytime} finished!")
myproc = subprocess.Popen(['ping', 'yahoo.com', '-t'], shell=True, stdout=subprocess.PIPE)
def main():
while True:
output = myproc.stdout.readline()
myoutput = output.strip().decode(encoding="UTF-8")
print(myoutput)
mytime = re.findall("(?<=time\=)(.*)(?=ms\s)", myoutput)
try:
mytime = int(mytime[0])
if mytime < 197:
print(f"The time {mytime} is low enought to call startme()" )
executor = ThreadPoolExecutor()
# executor = ProcessPoolExecutor() # I did tried using process even if it's not a CPU-related issue
executor.submit(startme, mytime)
except:
pass
main()
I did try using asyncio but I soon realized this is not the case but I'm wondering if I should try aiosqlite
I also thought about using asyncio.create_subprocess_shell and run both as parallel subprocesses but can't think of a way to wait for a certain string from the ping command that would trigger the second script.
Please note that I don't really need a return from the startme() function and the ping command example is conceptually derived from the mitmproxy's mitmdump output command.

The first code wasn't working as I did a stupid mistake when creating the thread so p1 = threading.Thread(target=startme(mytime)) does not take the function with its arguments but separately like this p1 = threading.Thread(target=startme, args=(mytime,))
The reason why I could not get the SQL insert statement to work in my second code was this error:
SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 10688 and this is thread id 17964
that I didn't saw until I wrapped my SQL statement into a try/except and captured the error. So I needed to make the SQL database connection inside my startme() function
The other asyncio stuff was just nonsense and cannot be applied to the current issue here.

Cause python to exit if any thread has an exception

I have a python3 program that starts a second thread (besides the main thread) for handling some events asynchronously. Ideally, my program works without a flaw and never has an unhandled exceptions. But stuff happens. When/if there is an exception, I want the whole interpreter to exit with an error code as if it had been a single thread. Is that possible?
Right now, if an exception occurs on the spawned thread, it prints out the usual error information, but doesn't exit. The main thread just keeps going.
Example
import threading
import time
def countdown(initial):
while True:
print(initial[0])
initial = initial[1:]
time.sleep(1)
if __name__ == '__main__':
helper = threading.Thread(target=countdown, args=['failsoon'])
helper.start()
time.sleep(0.5)
#countdown('THISWILLTAKELONGERTOFAILBECAUSEITSMOREDATA')
countdown('FAST')
The countdown will eventually fail to access [0] from the string because it's been emptied causing an IndexError: string index out of range error. The goal is that whether the main or helper dies first, the whole program dies alltogether, but the stack trace info is still output.
Solutions Tried
After some digging, my thought was to use sys.excepthook. I added the following:
def killAll(etype, value, tb):
print('KILL ALL')
traceback.print_exception(etype, value, tb)
os.kill(os.getpid(), signal.SIGKILL)
sys.excepthook = killAll
This works if the main thread is the one that dies first. But in the other case it does not. This seems to be a known issue (https://bugs.python.org/issue1230540). I will try some of the workarounds there.
While the example shows a main thread and a helper thread which I created, I'm interested in the general case where I may be running someone else's library that launches a thread.

Well, you could simply raise an error in your thread and have the main thread handle and report that error. From there you could even terminate the program.
For example on your worker thread:
try:
self.result = self.do_something_dangerous()
except Exception as e:
import sys
self.exc_info = sys.exc_info()
and on main thread:
if self.exc_info:
raise self.exc_info[1].with_traceback(self.exc_info[2])
return self.result
So to give you a more complete picture, your code might look like this:
import threading
class ExcThread(threading.Thread):
def excRun(self):
pass
#Where your core program will run
def run(self):
self.exc = None
try:
# Possibly throws an exception
self.excRun()
except:
import sys
self.exc = sys.exc_info()
# Save details of the exception thrown
# DON'T rethrow,
# just complete the function such as storing
# variables or states as needed
def join(self):
threading.Thread.join(self)
if self.exc:
msg = "Thread '%s' threw an exception: %s" % (self.getName(), self.exc[1])
new_exc = Exception(msg)
raise new_exc.with_traceback(self.exc[2])
(I added an extra line to keep track of which thread is causing the error in case you have multiple threads, it's also good practice to name them)

My solution ended up being a happy marriage between the solution posted here and the SIGKILL solution piece from above. I added the following killall.py submodule to my package:
import threading
import sys
import traceback
import os
import signal
def sendKillSignal(etype, value, tb):
print('KILL ALL')
traceback.print_exception(etype, value, tb)
os.kill(os.getpid(), signal.SIGKILL)
original_init = threading.Thread.__init__
def patched_init(self, *args, **kwargs):
print("thread init'ed")
original_init(self, *args, **kwargs)
original_run = self.run
def patched_run(*args, **kw):
try:
original_run(*args, **kw)
except:
sys.excepthook(*sys.exc_info())
self.run = patched_run
def install():
sys.excepthook = sendKillSignal
threading.Thread.__init__ = patched_init
And then ran the install right away before any other threads are launched (of my own creation or from other imported libraries).

Just wanted to share my simple solution.
In my case I wanted the exception to display as normal but then immediately stop the program. I was able to accomplish this by starting a timer thread with a small delay to call os._exit before raising the exception.
import os
import threading
def raise_and_exit(args):
threading.Timer(0.01, os._exit, args=(1,)).start()
raise args[0]
threading.excepthook = raise_and_exit

Python 3.8 added threading.excepthook which makes it possible to handle this more cleanly.
I wrote the package "unhandled_exit" to do just that. It basically adds os._exit(1) to after the default handler. This means you get the normal backtrace before the process exits.
Package is published to pypi here: https://pypi.org/project/unhandled_exit/
Code is here: https://github.com/rfjakob/unhandled_exit/blob/master/unhandled_exit/\_\_init__.py
Usage is simply:
import unhandled_exit
unhandled_exit.activate()

QSlider stuck though still emitting sliderMoved

In my PyQt4-based program, QSliders (with signals sliderMoved and sliderReleased connected to callables) sometimes "freeze", i.e. they don't move anymore when trying to drag them with the mouse, even though sliderMoved and sliderReleased are still emitted.
This behaviour happens seemingly randomly, sometimes after running the program for hours -- making it more or less impossible to reproduce and test.
Any help to solve this issue would be welcome.
EDIT: This is with PyQt 4.10.4 on Python 3.4 and Windows 7.

After some debugging I am pretty sure that this was due to calling a GUI slot from a separate thread, which (I knew) is forbidden. Fixing this to use a proper signal-slot approach seems to have fixed the issue.
After calling the patch function defined below, all slot calls are wrapped by a wrapper that checks that they are called only from the GUI thread -- a warning is printed otherwise. This is how I found the culprit.
import functools
import sys
import threading
import traceback
from PyQt4.QtCore import QMetaMethod
from PyQt4.QtGui import QWidget
SLOT_CACHE = {}
def patch():
"""Check for calls to widget slots outside of the main thread.
"""
qwidget_getattribute = QWidget.__getattribute__
def getattribute(obj, name):
attr = qwidget_getattribute(obj, name)
if type(obj) not in SLOT_CACHE:
meta = qwidget_getattribute(obj, "metaObject")()
SLOT_CACHE[type(obj)] = [
method.signature().split("(", 1)[0]
for method in map(meta.method, range(meta.methodCount()))
if method.methodType() == QMetaMethod.Slot]
if (isinstance(attr, type(print)) and # Wrap builtin functions only.
attr.__name__ in SLOT_CACHE[type(obj)]):
#functools.wraps(
attr, assigned=functools.WRAPPER_ASSIGNMENTS + ("__self__",))
def wrapper(*args, **kwargs):
if threading.current_thread() is not threading.main_thread():
print("{}.{} was called out of main thread:".format(
type(obj), name), file=sys.stderr)
traceback.print_stack()
return attr(*args, **kwargs)
return wrapper
else:
return attr
QWidget.__getattribute__ = getattribute

How to terminate a Python3 thread correctly while it's reading a stream

I'm using a thread to read Strings from a stream (/dev/tty1) while processing other things in the main loop. I would like the Thread to terminate together with the main program when pressing CTRL-C.
from threading import Thread
class myReader(Thread):
def run(self):
with open('/dev/tty1', encoding='ascii') as myStream:
for myString in myStream:
print(myString)
def quit(self):
pass # stop reading, close stream, terminate the thread
myReader = Reader()
myReader.start()
while(True):
try:
pass # do lots of stuff
KeyboardInterrupt:
myReader.quit()
raise
The usual solution - a boolean variable inside the run() loop - doesn't work here. What's the recommended way to deal with this?
I can just set the Daemon flag, but then I won't be able to use a quit() method which might prove valuable later (to do some clean-up). Any ideas?

AFAIK, there is no built-in mechanism for that in Python 3 (just as in Python 2). Have you tried the proven Python 2 approach with PyThreadState_SetAsyncExc, documented here and here, or the alternative tracing approach here?
Here's a slightly modified version of the PyThreadState_SetAsyncExc approach from above:
import threading
import inspect
import ctypes
def _async_raise(tid, exctype):
"""raises the exception, performs cleanup if needed"""
if not inspect.isclass(exctype):
exctype = type(exctype)
res = ctypes.pythonapi.PyThreadState_SetAsyncExc(ctypes.c_long(tid), ctypes.py_object(exctype))
if res == 0:
raise ValueError("invalid thread id")
elif res != 1:
# """if it returns a number greater than one, you're in trouble,
# and you should call it again with exc=NULL to revert the effect"""
ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, None)
raise SystemError("PyThreadState_SetAsyncExc failed")
def stop_thread(thread):
_async_raise(thread.ident, SystemExit)

Make your thread a daemon thread. When all non-daemon threads have exited, the program exits. So when Ctrl-C is passed to your program and the main thread exits, there's no need to explicitly kill the reader.
myReader = Reader()
myReader.daemon = True
myReader.start()

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Pool issue when called from parent pool Python - python-3.x

Related

Python Asyncio and Multithreading

Launching parallel tasks: Subprocess output triggers function asynchronously

Cause python to exit if any thread has an exception

QSlider stuck though still emitting sliderMoved

How to terminate a Python3 thread correctly while it's reading a stream

Categories

Resources