How to use ThreadPoolExecutor inside a gunicorn process? - python-3.x

I am running FastAPI app with gunicorn with the following config:
bind = 0.0.0.0:8080
worker_class = "uvicorn.workers.UvicornWorker"
workers = 3
loglevel = ServerConfig.LOG_LEVEL.lower()
max_requests = 1500
max_requests_jitter = 300
timeout = 120
Inside this app, I am doing some task (not very long running) every 0.5 seconds (through a Job Scheduler) and doing some processing on the data.
In that Job scheduler, I am calling "perform" method (See code below):
class BaseQueueConsumer:
def __init__(self, threads: int):
self._threads = threads
self._executor = ThreadPoolExecutor(max_workers=1)
def perform(self, param1, param2, param3) -> None:
futures = []
for _ in range(self._threads):
futures.append(
self._executor.submit(
BaseQueueConsumer.consume, param1, param2, param3
)
)
for future in futures:
future.done()
#staticmethod
def consume(param1, param2, param3) -> None:
# Doing some work here
The problem is, whenever this app is under a high load, I am getting the following error:
cannot schedule new futures after shutdown
My guess is that the gunicorn process restarts every 1500 requests (max_requests) and the tasks that are already submitted are causing this issue.
What I am not able to understand is that whatever thread gunicorn process starts due to threadpoolexecutor should also end when the process is terminated but that is not the case.
Can someone help me explain this behaviour and a possible solution for gracefully ending the gunicorn process without these threadpoolexecutor tasks causing errors?
I am using python 3.8 and gunicorn 0.15.0

Related

Getting `BrokenProcessPool` error in a `concurrent.futures` example

The example I am running is mentioned in this PyMOTW3 link. I am reproducing the code here:
from concurrent import futures
import os
def task(n):
return (n, os.getpid())
ex = futures.ProcessPoolExecutor(max_workers=2)
results = ex.map(task, range(5, 0, -1))
for n, pid in results:
print('ran task {} in process {}'.format(n, pid))
As per source, I am supposed to get following output:
ran task 5 in process 40854
ran task 4 in process 40854
ran task 3 in process 40854
ran task 2 in process 40854
ran task 1 in process 40854
Instead, I'm getting a long message with following concluding line -
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
I am using Windows machine and running Python 9. All other examples are otherwise running fine. What is going wrong here?
I've finally been able to resolve the issue. The issue seems to be Windows specific. Following a related Stack Overflow post, I used if __name__=="__main__" idiom. The modified code is:
from concurrent import futures
import os
def task(n):
return (n, os.getpid())
def main():
ex = futures.ProcessPoolExecutor(max_workers=2)
results = ex.map(task, range(5, 0, -1))
for n, pid in results:
print('ran task {} in process {}'.format(n, pid))
if __name__ == '__main__':
main()
It worked, although I'm still not sure why this worked.

Multiprocess : Persistent Pool?

I have code like the one below :
def expensive(self,c,v):
.....
def inner_loop(self,c,collector):
self.db.query('SELECT ...',(c,))
for v in self.db.cursor.fetchall() :
collector.append( self.expensive(c,v) )
def method(self):
# create a Pool
#join the Pool ??
self.db.query('SELECT ...')
for c in self.db.cursor.fetchall() :
collector = []
#RUN the whole cycle in parallel in separate processes
self.inner_loop(c, collector)
#do stuff with the collector
#! close the pool ?
both the Outer and the Inner loop are thousands of steps ...
I think I understand how to run a Pool of couple of processes.
All the examples I found show that more or less.
But in my case I need to lunch a persistent Pool and then feed the data (c-value). Once a inner-loop process has finished I have to supply the next-available-c-value.
And keep the processes running and collect the results.
How do I do that ?
A clunky idea I have is :
def method(self):
ws = 4
with Pool(processes=ws) as pool :
cs = []
for i,c in enumerate(..) :
cs.append(c)
if i % ws == 0 :
res = [pool.apply(self.inner_loop, (c)) for i in range(ws)]
cs = []
collector.append(res)
will this keep the same pool running !! i.e. not lunch new process every time ?i
Do I need 'if i % ws == 0' part or I can use imap(), map_async() and the Pool obj will block the loop when available workers are exhausted and continue when some are freed ?
Yes, the way that multiprocessing.Pool works is:
Worker processes within a Pool typically live for the complete duration of the Pool’s work queue.
So simply submitting all your work to the pool via imap should be sufficient:
with Pool(processes=4) as pool:
initial_results = db.fetchall("SELECT c FROM outer")
results = [pool.imap(self.inner_loop, (c,)) for c in initial_results]
That said, if you really are doing this to fetch things from the DB, it may make more sense to move more processing down into that layer (bring the computation to the data rather than bringing the data to the computation).

How to use a ros2 service to trigger an asyncio non-blocking function (without Actions)

The problem I am having is that:
To start an async function in the background I need an asycio event
loop.
This event loop usualy exists in the main thread, and when started,
blocks the exeuction of that thread (i.e lines of code after starting
the event loop aren't run untill the event loop is cancelled).
However, ROS2 has it's own event loop (executor) that also usually runs in the main thread
and blocks execution. This means it is difficult to have both event loops running
My attempted sollution was to start the asyncio event loop in a seperate thread. This is started in the Node constructor, and stops after the Node is deconstructed.
This looks like this:
class IncrementPercentDoneServiceNode(Node):
def __create_task(self, f: Awaitable):
self.__task = self.__loop.create_task(f)
def __init__(self):
super().__init__('increment_percent_done_service_node')
self.__loop = asyncio.new_event_loop()
self.__task: Optional[Task] = None
self.__thread = threading.Thread(target=self.__loop.run_forever)
self.__thread.start()
self.done = False
self.create_service(Trigger, 'start_incrementing',
callback=lambda request, responce : (
self.get_logger().info("Starting service"),
self.__loop.call_soon_threadsafe(self.__create_task, self.__increment_percent_complete()),
TriggerResponse(success=True, message='')
)[-1]
)
def __del__(self):
print("stopping loop")
self.done = True
if self.__task is not None:
self.__task.cancel()
self.__loop.stop()
self.__thread.join()
async def __increment_percent_complete(self):
timeout_start = time.time()
duration = 5
while time.time() < (timeout_start + duration):
time_since_start = time.time() - timeout_start
percent_complete = (time_since_start / duration) * 100.0
self.get_logger().info("Percent complete: {}%".format(percent_complete))
await asyncio.sleep(0.5)
self.get_logger().info("leaving async function")
self.done = True
if __name__ == '__main__':
rclpy.init()
test = IncrementPercentDoneServiceNode()
e = MultiThreadedExecutor()
e.add_node(test)
e.spin()
Is this a sensible way to do it? Is there a better way? How would I cancel the start_incrementing service with another service? (I know that this is what actions are for, but I cannot use them in this instance).

Python - Pass a function (callback) variable between functions running in separate threads

I am trying to develop a Python 3.6 script which uses pika and threading modules.
I have a problem which I think is caused by my A) being very new to Python and coding in general, and B) my not understanding how to pass variables between functions when they are run in separate threads and already being passed a parameter in parentheses at the end of the receiving function name.
The reason I think this, is because when I do not use threading, I can pass a variable between functions simply by calling the receiving function name, and supplying the variable to be passed, in parentheses, a basic example is shown below:
def send_variable():
body = "this is a text string"
receive_variable(body)
def receive_variable(body):
print(body)
This when run, prints:
this is a text string
A working version of the code I need to to get working with threading is shown below - this uses straight functions (no threading) and I am using pika to receive messages from a (RabbitMQ) queue via the pika callback function, I then pass the body of the message received in the 'callback' function to the 'processing function' :
import pika
...mq connection variables set here...
# defines username and password credentials as variables set at the top of this script
credentials = pika.PlainCredentials(mq_user_name, mq_pass_word)
# defines mq server host, port and user credentials and creates a connection
connection = pika.BlockingConnection(pika.ConnectionParameters(host=mq_host, port=mq_port, credentials=credentials))
# creates a channel connection instance using the above settings
channel = connection.channel()
# defines the queue name to be used with the above channel connection instance
channel.queue_declare(queue=mq_queue)
def callback(ch, method, properties, body):
# passes (body) to processing function
body_processing(body)
# sets channel consume type, also sets queue name/message acknowledge settings based on variables set at top of script
channel.basic_consume(callback, queue=mq_queue, no_ack=mq_no_ack)
# tells the callback function to start consuming
channel.start_consuming()
# calls the callback function to start receiving messages from mq server
callback()
# above deals with pika connection and the main callback function
def body_processing(body):
...code to send a pika message every time a 'body' message is received...
This works fine however I want to translate this to run within a script that uses threading. When I do this I have to supply the parameter 'channel' to the function name that runs in its own thread - when I then try to include the 'body' parameter so that the 'processing_function' looks as per the below:
def processing_function(channel, body):
I get an error saying:
[function_name] is missing 1 positional argument: 'body'
I know that when using threading there is more code needed and I have included the actual code that I use for threading below so that you can see what I am doing:
...imports and mq variables and pika connection details are set here...
def get_heartbeats(channel):
channel.queue_declare(queue=queue1)
#print (' [*] Waiting for messages. To exit press CTRL+C')
def callback(ch, method, properties, body):
process_body(body)
#print (" Received %s" % (body))
channel.basic_consume(callback, queue=queue1, no_ack=no_ack)
channel.start_consuming()
def process_body(channel, body):
channel.queue_declare(queue=queue2)
#print (' [*] Waiting for Tick messages. To exit press CTRL+C')
# sets the mq host which pika client will use to send a message to
connection = pika.BlockingConnection(pika.ConnectionParameters(host=mq_host))
# create a channel connection instance
channel = connection.channel()
# declare a queue to be used by the channel connection instance
channel.queue_declare(queue=order_send_queue)
# send a message via the above channel connection settings
channel.basic_publish(exchange='', routing_key=send_queue, body='Test Message')
# send a message via the above channel settings
# close the channel connection instance
connection.close()
def manager():
# Channel 1 Connection Details - =======================================================================================
credentials = pika.PlainCredentials(mq_user_name, mq_password)
connection1 = pika.BlockingConnection(pika.ConnectionParameters(host=mq_host, credentials=credentials))
channel1 = connection1.channel()
# Channel 1 thread =====================================================================================================
t1 = threading.Thread(target=get_heartbeats, args=(channel1,))
t1.daemon = True
threads.append(t1)
# as this is thread 1 call to start threading is made at start threading section
# Channel 2 Connection Details - =======================================================================================
credentials = pika.PlainCredentials(mq_user_name, mq_password)
connection2 = pika.BlockingConnection(pika.ConnectionParameters(host=mq_host, credentials=credentials))
channel2 = connection2.channel()
# Channel 2 thread ====================================================================================================
t2 = threading.Thread(target=process_body, args=(channel2, body))
t2.daemon = True
threads.append(t2)
t2.start() # as this is thread 2 - we need to start the thread here
# Start threading
t1.start() # start the first thread - other threads will self start as they call t1.start() in their code block
for t in threads: # for all the threads defined
t.join() # join defined threads
manager() # run the manager module which starts threads that call each module
This when run produces the error
process_body() missing 1 required positional argument: (body)
and I do not understand why this is or how to fix it.
Thank you for taking the time to read this question and any help or advice you can supply is much appreciated.
Please keep in mind that I am new to python and coding so may need things spelled out rather than being able to understand more cryptic replies.
Thanks!
On further looking in to this and playing with the code it seems that if I edit the lines:
def process_body(channel, body):
to read
def process_body(body):
and
t2 = threading.Thread(target=process_body, args=(channel2, body))
so that it reads:
t2 = threading.Thread(target=process_body)
then the code seems to work as needed - I also see multiple script processes in htop so it appears that threading is working - I have left the script processing for 24 hours + and did not receive any errors...

No timeout for python-rq job

Is there a way to specify a "maximum" = inf timeout for a worker?
I have some long-running tasks and if something fails due to timeouts I handle it internally within the worker.
Can a specify this through the cli?
timeout argument specifies the maximum runtime of the task before it's considered 'lost'. Can be used with #job, Queue, enqueue & enqueue_call.
from rq.decorators import job
#job('low', connection=my_redis_conn, timeout=600)
def long_running_task(x, y):
# Code
python-rq.org/docs
Setting Queue(default_timeout=-1) will do the trick. Here is a reference to their source code:
def create_job(self, func, args=None, kwargs=None, timeout=None,
result_ttl=None, ttl=None, failure_ttl=None,
description=None, depends_on=None, job_id=None,
meta=None, status=JobStatus.QUEUED, retry=None):
"""Creates a job based on parameters given."""
timeout = parse_timeout(timeout)
if timeout is None:
timeout = self._default_timeout
elif timeout == 0:
raise ValueError('0 timeout is not allowed. Use -1 for infinite timeout')

Resources