Getting `BrokenProcessPool` error in a `concurrent.futures` example - python-3.x

The example I am running is mentioned in this PyMOTW3 link. I am reproducing the code here:
from concurrent import futures
import os
def task(n):
return (n, os.getpid())
ex = futures.ProcessPoolExecutor(max_workers=2)
results = ex.map(task, range(5, 0, -1))
for n, pid in results:
print('ran task {} in process {}'.format(n, pid))
As per source, I am supposed to get following output:
ran task 5 in process 40854
ran task 4 in process 40854
ran task 3 in process 40854
ran task 2 in process 40854
ran task 1 in process 40854
Instead, I'm getting a long message with following concluding line -
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
I am using Windows machine and running Python 9. All other examples are otherwise running fine. What is going wrong here?

I've finally been able to resolve the issue. The issue seems to be Windows specific. Following a related Stack Overflow post, I used if __name__=="__main__" idiom. The modified code is:
from concurrent import futures
import os
def task(n):
return (n, os.getpid())
def main():
ex = futures.ProcessPoolExecutor(max_workers=2)
results = ex.map(task, range(5, 0, -1))
for n, pid in results:
print('ran task {} in process {}'.format(n, pid))
if __name__ == '__main__':
main()
It worked, although I'm still not sure why this worked.

Related

How to use ThreadPoolExecutor inside a gunicorn process?

I am running FastAPI app with gunicorn with the following config:
bind = 0.0.0.0:8080
worker_class = "uvicorn.workers.UvicornWorker"
workers = 3
loglevel = ServerConfig.LOG_LEVEL.lower()
max_requests = 1500
max_requests_jitter = 300
timeout = 120
Inside this app, I am doing some task (not very long running) every 0.5 seconds (through a Job Scheduler) and doing some processing on the data.
In that Job scheduler, I am calling "perform" method (See code below):
class BaseQueueConsumer:
def __init__(self, threads: int):
self._threads = threads
self._executor = ThreadPoolExecutor(max_workers=1)
def perform(self, param1, param2, param3) -> None:
futures = []
for _ in range(self._threads):
futures.append(
self._executor.submit(
BaseQueueConsumer.consume, param1, param2, param3
)
)
for future in futures:
future.done()
#staticmethod
def consume(param1, param2, param3) -> None:
# Doing some work here
The problem is, whenever this app is under a high load, I am getting the following error:
cannot schedule new futures after shutdown
My guess is that the gunicorn process restarts every 1500 requests (max_requests) and the tasks that are already submitted are causing this issue.
What I am not able to understand is that whatever thread gunicorn process starts due to threadpoolexecutor should also end when the process is terminated but that is not the case.
Can someone help me explain this behaviour and a possible solution for gracefully ending the gunicorn process without these threadpoolexecutor tasks causing errors?
I am using python 3.8 and gunicorn 0.15.0

How to kill a QProcess instance using os.kill()?

Problem
Hey, recently when I'm using pyqt6's QProcess, I try to use os.kill() to kill a QProcess's instance. (The reason why I want to use os.kill() instead of QProcess().kill() is that I want to send a CTRL_C_EVENT signal when killing the process.) Even though with using correct pid (acquired by calling QProcess().processId()), it seems that a signal would be sent to all processes unexpectedly.
Code
Here's my code:
from PyQt6.QtCore import QProcess
import os
import time
import signal
process_a = QProcess()
process_a.start("python", ['./test.py'])
pid_a = process_a.processId()
print(f"pid_a = {pid_a}")
process_b = QProcess()
process_b.start("python", ['./test.py'])
pid_b = process_b.processId()
print(f"pid_b = {pid_b}")
os.kill(pid_a, signal.CTRL_C_EVENT)
try:
time.sleep(1)
except KeyboardInterrupt:
print("A KeyboardInterrupt should not be caught here.")
process_a.waitForFinished()
process_b.waitForFinished()
print(f"process_a: {process_a.readAll().data().decode('gbk')}")
print(f"process_b: {process_b.readAll().data().decode('gbk')}")
and ./test.py is simple:
import time
time.sleep(3)
print("Done")
What I'm expecting
pid_a = 19956
pid_b = 28468
process_a:
process_b: Done
What I've got
pid_a = 28040
pid_b = 23708
A KeyboardInterrupt should not be caught here.
process_a:
process_b:
Discussion
I don't know whether this is a bug or misusage. It seems that signal.CTRL_C_EVENT is sent to all processes. So, how do I kill one QProcess instance with signal CTRL_C_EVENT correctly?

Streaming read from subprocess

I need to read output from a child process as it's produced -- perhaps not on every write, but well before the process completes. I've tried solutions from the Python3 docs and SO questions here and here, but I still get nothing until the child terminates.
The application is for monitoring training of a deep learning model. I need to grab the test output (about 250 bytes for each iteration, at roughly 1-minute intervals) and watch for statistical failures.
I cannot change the training engine; for instance, I cannot insert stdout.flush() in the child process code.
I can reasonably wait for a dozen lines of output to accumulate; I was hopeful of a buffer-fill solving my problem.
Code: variations are commented out.
Parent
cmd = ["/usr/bin/python3", "zzz.py"]
# test_proc = subprocess.Popen(
test_proc = subprocess.run(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT
)
out_data = ""
print(time.time(), "START")
while not "QUIT" in str(out_data):
out_data = test_proc.stdout
# out_data, err_data = test_proc.communicate()
print(time.time(), "MAIN received", out_data)
Child (zzz.py)
from time import sleep
import sys
for _ in range(5):
print(_, "sleeping", "."*1000)
# sys.stdout.flush()
sleep(1)
print("QUIT this exercise")
Despite sending lines of 1000+ bytes, the buffer (tested elsewhere as 2kb; here, I've gone as high as 50kb) filling doesn't cause the parent to "see" the new text.
What am I missing to get this to work?
Update with regard to links, comments, and iBug's posted answer:
Popen instead of run fixed the blocking issue. Somehow I missed this in the documentation and my experiments with both.
universal_newline=True neatly changed the bytes return to string: easier to handle on the receiving end, although with interleaved empty lines (easy to detect and discard).
Setting bufsize to something tiny (e.g. 1) didn't affect anything; the parent still has to wait for the child to fill the stdout buffer, 8k in my case.
export PYTHONUNBUFFERED=1 before execution did fix the buffering problem. Thanks to wim for the link.
Unless someone comes up with a canonical, nifty solution that makes these obsolete, I'll accept iBug's answer tomorrow.
subprocess.run always spawns the child process, and blocks the thread until it exits.
The only option for you is to use p = subprocess.Popen(...) and read lines with s = p.stdout.readline() or p.stdout.__iter__() (see below).
This code works for me, if the child process flushes stdout after printing a line (see below for extended note).
cmd = ["/usr/bin/python3", "zzz.py"]
test_proc = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT
)
out_data = ""
print(time.time(), "START")
while not "QUIT" in str(out_data):
out_data = test_proc.stdout.readline()
print(time.time(), "MAIN received", out_data)
test_proc.communicate() # shut it down
See my terminal log (dots removed from zzz.py):
ibug#ubuntu:~/t $ python3 p.py
1546450821.9174328 START
1546450821.9793346 MAIN received b'0 sleeping \n'
1546450822.987753 MAIN received b'1 sleeping \n'
1546450823.993136 MAIN received b'2 sleeping \n'
1546450824.997726 MAIN received b'3 sleeping \n'
1546450825.9975247 MAIN received b'4 sleeping \n'
1546450827.0094354 MAIN received b'QUIT this exercise\n'
You can also do it with a for loop:
for out_data in test_proc.stdout:
if "QUIT" in str(out_data):
break
print(time.time(), "MAIN received", out_data)
If you cannot modify the child process, unbuffer (from package expect - install with APT or YUM) may help. This is my working parent code without changing the child code.
test_proc = subprocess.Popen(
["unbuffer"] + cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT
)

scheduling a task at multiple timings(with different parameters) using celery beat but task run only once(with random parameters)

What i am trying to achieve
Write a scheduler, that uses a database to schedule similar tasks at different timings.
For the same i am using celery beat, the code snippet below would give an idea
try:
reader = MongoReader()
except:
raise
try:
tasks = reader.get_scheduled_tasks()
except:
raise
celerybeat_schedule = dict()
for task in tasks:
celerybeat_schedule[task["task_id"]] =dict()
celerybeat_schedule[task["task_id"]]["task"] = task["task_name"]
celerybeat_schedule[task["task_id"]]["args"] = (task,)
celerybeat_schedule[task["task_id"]]["schedule"] = get_task_schedule(task)
app.conf.update(BROKER_URL=rabbit_mq_endpoint, CELERY_TASK_SERIALIZER='json', CELERY_ACCEPT_CONTENT=['json'], CELERYBEAT_SCHEDULE=celerybeat_schedule)
so these are three steps
- reading all tasks from datastore
- creating a dictionary, celery scheduler which is populated by all tasks having properties, task_name(method that would run), parameters(data to pass to the method), schedule(stores when to run)
- updating this with celery configurations
Expected scenario
given all entries run the same celery task name that just prints, have same schedule to be run every 5 min, having different parameters specifying what to print, lets say db has
task name , parameter , schedule
regular_print , Hi , {"minutes" : 5}
regular_print , Hello , {"minutes" : 5}
regular_print , Bye , {"minutes" : 5}
I expect, these to be printing every 5 minutes to print all three
What happens
Only one of Hi, Hello, Bye prints( possible randomly, surely not in sequence)
Please help,
Thanks a lot in advance :)
Was able to resolve this using version 4 of celery. Sample similar to what worked for me.. can also find in documentation by celery for version 4
#taking address and user-pass from environment(you can mention direct values)
ex_host_queue = os.environ["EX_HOST_QUEUE"]
ex_port_queue = os.environ["EX_PORT_QUEUE"]
ex_user_queue = os.environ["EX_USERID_QUEUE"]
ex_pass_queue = os.environ["EX_PASSWORD_QUEUE"]
broker= "amqp://"+ex_user_queue+":"+ex_pass_queue+"#"+ex_host_queue+":"+ex_port_queue+"//"
#celery initialization
app = Celery(__name__,backend=broker, broker=broker)
app.conf.task_default_queue = 'scheduler_queue'
app.conf.update(
task_serializer='json',
accept_content=['json'], # Ignore other content
result_serializer='json'
)
task = {"task_id":1,"a":10,"b":20}
##method to update scheduler
def add_scheduled_task(task):
print("scheduling task")
del task["_id"]
print("adding task_id")
name = task["task_name"]
app.add_periodic_task(timedelta(minutes=1),add.s(task), name = task["task_id"])
#app.task(name='scheduler_task')
def scheduler_task(data):
print(str(data["a"]+data["b"]))

PySide QtCore.QThreadPool and QApplication.quit() causes hangs?

I want to use Qt's QThreadPool, but it seems to be hanging my application if the workers in the queue do not finish before calling QApplication.quit(). Can anyone tell me if i'm doing something wrong in the reduced testcase below?
import logging
log = logging.getLogger(__name__)
import sys
from PySide import QtCore
import time
class SomeWork(QtCore.QRunnable):
def __init__(self, sleepTime=1):
super(SomeWork, self).__init__()
self.sleepTime = sleepTime
def run(self):
time.sleep(self.sleepTime)
print "work", QtCore.QThread.currentThreadId()
def _test(argv):
logging.basicConfig(level=logging.NOTSET)
app = QtCore.QCoreApplication(argv)
pool = QtCore.QThreadPool.globalInstance()
TASK_COUNT = int(argv[1]) if len(argv) > 1 else 1
mainThread = QtCore.QThread.currentThreadId()
print "Main thread: %s"%(mainThread)
print "Max thread count: %s"%(pool.maxThreadCount())
print "Work count: %s"%(TASK_COUNT)
for i in range(TASK_COUNT):
pool.start(SomeWork(1))
def boom():
print "boom(); calling app.quit()"
app.quit()
QtCore.QTimer.singleShot(2000, boom)
#import signal
#signal.signal(signal.SIGINT, signal.SIG_DFL)
return app.exec_()
if __name__ == '__main__':
sys.exit(_test(sys.argv))
To be clear, this is the output I get:
(env)root#localhost:# python test_pool.py 1
Main thread: 3074382624
Max thread count: 1
Work count: 1
work 3061717872
boom(); calling app.quit()
(env)root#workshop:/home/workshop/workshop/workshop# python test_pool.py 20
Main thread: 3074513696
Max thread count: 1
Work count: 20
work 3060783984
boom(); calling app.quit()
And it hangs forever on the second command, but not the first.
Thanks for any help you may have.
EDIT:
To be clear, I expect that if app.quit() is called while threads are in the thread queue, they do not run. Already running threads should run to completion. Then, the application should close.
This example fails on a Windows machine as well
This example works on the same Windows machine, but using PyQt4
Adding this to _test() just before the exec() fixes the issue, although all the threads run:
def waitForThreads():
print "Waiting for thread pool"
pool.waitForDone()
app.aboutToQuit.connect(waitForThreads)

Resources