Is there a way to specify a "maximum" = inf timeout for a worker?
I have some long-running tasks and if something fails due to timeouts I handle it internally within the worker.
Can a specify this through the cli?
timeout argument specifies the maximum runtime of the task before it's considered 'lost'. Can be used with #job, Queue, enqueue & enqueue_call.
from rq.decorators import job
#job('low', connection=my_redis_conn, timeout=600)
def long_running_task(x, y):
# Code
python-rq.org/docs
Setting Queue(default_timeout=-1) will do the trick. Here is a reference to their source code:
def create_job(self, func, args=None, kwargs=None, timeout=None,
result_ttl=None, ttl=None, failure_ttl=None,
description=None, depends_on=None, job_id=None,
meta=None, status=JobStatus.QUEUED, retry=None):
"""Creates a job based on parameters given."""
timeout = parse_timeout(timeout)
if timeout is None:
timeout = self._default_timeout
elif timeout == 0:
raise ValueError('0 timeout is not allowed. Use -1 for infinite timeout')
Related
I am running FastAPI app with gunicorn with the following config:
bind = 0.0.0.0:8080
worker_class = "uvicorn.workers.UvicornWorker"
workers = 3
loglevel = ServerConfig.LOG_LEVEL.lower()
max_requests = 1500
max_requests_jitter = 300
timeout = 120
Inside this app, I am doing some task (not very long running) every 0.5 seconds (through a Job Scheduler) and doing some processing on the data.
In that Job scheduler, I am calling "perform" method (See code below):
class BaseQueueConsumer:
def __init__(self, threads: int):
self._threads = threads
self._executor = ThreadPoolExecutor(max_workers=1)
def perform(self, param1, param2, param3) -> None:
futures = []
for _ in range(self._threads):
futures.append(
self._executor.submit(
BaseQueueConsumer.consume, param1, param2, param3
)
)
for future in futures:
future.done()
#staticmethod
def consume(param1, param2, param3) -> None:
# Doing some work here
The problem is, whenever this app is under a high load, I am getting the following error:
cannot schedule new futures after shutdown
My guess is that the gunicorn process restarts every 1500 requests (max_requests) and the tasks that are already submitted are causing this issue.
What I am not able to understand is that whatever thread gunicorn process starts due to threadpoolexecutor should also end when the process is terminated but that is not the case.
Can someone help me explain this behaviour and a possible solution for gracefully ending the gunicorn process without these threadpoolexecutor tasks causing errors?
I am using python 3.8 and gunicorn 0.15.0
I am trying to make resources unavailable for a certain time in simpy. The issue is with timeout I find the resource is still active and serving during the time it should be unavailable. Can anyone help me with this in case you have encountered such a problem. Thanks a lot!
import numpy as np
import simpy
def interarrival():
return(np.random.exponential(10))
def servicetime():
return(np.random.exponential(20))
def servicing(env, servers_1):
i = 0
while(True):
i = i+1
yield env.timeout(interarrival())
print("Customer "+str(i)+ " arrived in the process at "+str(env.now))
state = 0
env.process(items(env, i, servers_array, state))
def items(env, customer_id, servers_array, state):
with servers_array[state].request() as request:
yield request
t_arrival = env.now
print("Customer "+str(customer_id)+ " arrived in "+str(state)+ " at "+str(t_arrival))
yield env.timeout(servicetime())
t_depart = env.now
print("Customer "+str(customer_id)+ " departed from "+str(state)+ " at "+str(t_depart))
if (state == 1):
print("Customer exists")
else:
state = 1
env.process(items(env, customer_id, servers_array, state))
def delay(env, servers_array):
while(True):
if (env.now%1440 >= 540 and env.now <= 1080):
yield(1080 - env.now%1440)
else:
print(str(env.now), "resources will be blocked")
resource_unavailability_dict = dict()
resource_unavailability_dict[0] = []
resource_unavailability_dict[1] = []
for nodes in resource_unavailability_dict:
for _ in range(servers_array[nodes].capacity):
resource_unavailability_dict[nodes].append(servers_array[nodes].request())
print(resource_unavailability_dict)
for nodes in resource_unavailability_dict:
yield env.all_of(resource_unavailability_dict[nodes])
if (env.now < 540):
yield env.timeout(540)
else:
yield env.timeout((int(env.now/1440)+1)*1440+540 - env.now)
for nodes in resource_unavailability_dict:
for request in resource_unavailability_dict[nodes]:
servers_array[nodes].release(request)
print(str(env.now), "resources are released")
env = simpy.Environment()
servers_array = []
servers_array.append(simpy.Resource(env, capacity = 5))
servers_array.append(simpy.Resource(env, capacity = 7))
env.process(servicing(env, servers_array))
env.process(delay(env,servers_array))
env.run(until=2880)
The code is given above. Actually, I have two nodes 0 and 1 where server capacities are 5 and 7 respectively. The servers are unavailable before 9AM (540 mins from midnight) and after 6 PM everyday. I am trying to create the unavailability using timeout but not working. Can you suggest how do I modify the code to incorporate it.
I am getting the error AttributeError: 'int' object has no attribute 'callbacks'which I can't figure out why ?
So the problem with simpy resources is the capacity is a read only attribute. To get around this you need something to seize and hold the resource off line. So in essence, I have two types of users, the ones that do "real work" and the ones that control the capacity. I am using a simple resource, which means that the queue at the schedule time will get processed before the capacity change occurs. Using a priority resource means the current users of a resource can finish their processes before the capacity change occurs , or you can use a pre-emptive resource to interrupt users with resources at the scheduled time. here is my code
"""
one way to change a resouce capacity on a schedule
note the the capacity of a resource is a read only atribute
Programmer: Michael R. Gibbs
"""
import simpy
import random
def schedRes(env, res):
"""
Performs maintenance at time 100 and 200
waits till all the resources have been seized
and then spend 25 time units doing maintenace
and then release
since I am using a simple resource, maintenance
will wait of all request that are already in
the queue when maintenace starts to finish
you can change this behavior with a priority resource
or pre-emptive resource
"""
# wait till first scheduled maintenance
yield env.timeout(100)
# build a list of requests for each resource
# then wait till all requests are filled
res_maint_list = []
print(env.now, "Starting maintenance")
for _ in range(res.capacity):
res_maint_list.append(res.request())
yield env.all_of(res_maint_list)
print(env.now, "All resources seized for maintenance")
# do maintenance
yield env.timeout(25)
print(env.now, "Maintenance fisish")
# release all the resources
for req in res_maint_list:
res.release(req)
print(env.now,"All resources released from maint")
# wait till next scheduled maintenance
dur_to_next_maint = 200 -env.now
if dur_to_next_maint > 0:
yield env.timeout(dur_to_next_maint)
# do it all again
res_maint_list = []
print(env.now, "Starting maintenance")
for _ in range(res.capacity):
res_maint_list.append(res.request())
yield env.all_of(res_maint_list)
print(env.now, "All resources seized for maintenance")
yield env.timeout(25)
print(env.now, "Maintenance fisish")
for req in res_maint_list:
res.release(req)
print(env.now,"All resources released from maint")
def use(env, res, dur):
"""
Simple process of a user seizing a resource
and keeping it for a little while
"""
with res.request() as req:
print(env.now, f"User is in queue of size {len(res.queue)}")
yield req
print(env.now, "User has seized a resource")
yield env.timeout(dur)
print(env.now, "User has released a resource")
def genUsers(env,res):
"""
generate users to seize resources
"""
while True:
yield env.timeout(10)
env.process(use(env,res,21))
# set up
env = simpy.Environment()
res = simpy.Resource(env,capacity=2) # may want to use a priority or preemtive resource
env.process(genUsers(env,res))
env.process(schedRes(env, res))
# start
env.run(300)
One way to do this is with preemptive resources. When it is time to make resources unavailable, issue a bunch of requests with the highest priority to seize idle resources, and to preempt resources currently in use. These requests would then release the resources when its time to make the resources available again. Note that you will need to add some logic on how the preempted processes resume once the resources become available again. If you do not need to preempt processes, you can just use priority resources instead of preemptive resources
I am writing a job scheduler where I schedule M jobs across N co-routines (N < M). As soon as one job finishes, I add a new job so that it can start immediately and run in parallel with the other jobs. Additionally, I would like to ensure that no single job takes more than a certain fixed amount of time. Any jobs that take too long should be cancelled. I have something pretty close, like this:
def update_run_set(waiting, running, max_concurrency):
number_to_add = min(len(waiting), max_concurrency - len(running))
for i in range(0, number_to_add):
next_one = waiting.pop()
running.add(next_one)
async def _run_test_invocations_asynchronously(jobs:List[MyJob], max_concurrency:int, timeout_seconds:int):
running = set() # These tasks are actively being run
waiting = set() # These tasks have not yet started
waiting = {_run_job_coroutine(job) for job in jobs}
update_run_set(waiting, running, max_concurrency)
while len(running) > 0:
done, running = await asyncio.wait(running, timeout=timeout_seconds,
return_when=asyncio.FIRST_COMPLETED)
if not done:
timeout_count = len(running)
[r.cancel() for r in running] # Start cancelling the timed out jobs
done, running = await asyncio.wait(running) # Wait for cancellation to finish
assert(len(done) == timeout_count)
assert(len(running) == 0)
else:
for d in done:
job_return_code = await d
if len(waiting) > 0:
update_run_set(waiting, running, max_concurrency)
assert(len(running) > 0)
The problem here is that say my timeout is 5 seconds, and I'm scheduling 3 jobs across 4 cores. Job A takes 2 seconds, Job B takes 6 seconds and job C takes 7 seconds.
We have something like this:
t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7
-------|-------|-------|-------|-------|-------|-------|-------|
AAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
However, at t=2 the asyncio.await() call returns because A completed. It then loops back up to the top and runs again. At this point B has already been running for 2 seconds, but since it starts the countdown over, and only has 4 seconds remaining until it completes, B will appear to be successful. So after 4 seconds we return again, B is successful, then we start the loop over and now C completes.
How do I make it so that B and C both fail? I somehow need the time to be preserved across calls to asyncio.wait().
One idea that I had is to do my own bookkeeping of how much time each job is allowed to continue running, and pass the minimum of these into asyncio.wait(). Then when something times out, I can cancel only those jobs whose time remaining was equal to the value I passed in for timeout_seconds.
This requires a lot of manual bookkeeping on my part though, and I can't help but wonder about floating point problems which cause me to decide that it's not time to cancel a job even though it really is). So I can't help but think that there's something easier. Would appreciate any ideas.
You can wrap each job into a coroutine that checks its timeout, e.g. using asyncio.wait_for. Limiting the number of parallel invocations could be done in the same coroutine using an asyncio.Semaphore. With those two combined, you only need one call to wait() or even just gather(). For example (untested):
# Run the job, limiting concurrency and time. This code could likely
# be part of _run_job_coroutine, omitted from the question.
async def _run_job_with_limits(job, sem, timeout):
async with sem:
try:
await asyncio.wait_for(_run_job_coroutine(job), timeout)
except asyncio.TimeoutError:
# timed out and canceled, decide what you want to return
pass
async def _run_test_invocations_async(jobs, max_concurrency, timeout):
sem = asyncio.Semaphore(max_concurrency)
return await asyncio.gather(
*(_run_job_with_limits(job, sem, timeout) for job in jobs)
)
I am using flask, and have a route that sends emails to people. I am using threading to send them faster. When i run it on my local machine it takes about 12 seconds to send 300 emails. But when I run it on lambda thorough API Gateway it times out.
here's my code:
import threading
def async_mail(app, msg):
with app.app_context():
mail.send(msg)
def mass_mail_sender(order, user, header):
html = render_template('emails/pickup_mail.html', bruger_order=order.ordre, produkt=order.produkt)
msg = Message(recipients=[user],
sender=('Sender', 'infor#example.com'),
html=html,
subject=header)
thread = threading.Thread(target=async_mail, args=[create_app(), msg])
thread.start()
return thread
#admin.route('/lager/<url_id>/opdater', methods=['POST'])
def update_stock(url_id):
start = time.time()
if current_user.navn != 'Admin':
abort(403)
if request.method == 'POST':
produkt = Produkt.query.filter_by(url_id=url_id)
nyt_antal = int(request.form['bestilt_hjem'])
produkt.balance = nyt_antal
produkt.bestilt_hjem = nyt_antal
db.session.commit()
orders = OrdreBog.query.filter(OrdreBog.produkt.has(func.lower(Produkt.url_id == url_id))) \
.filter(OrdreBog.produkt_status == 'Ikke klar').all()
threads = []
for order in orders:
if order.antal <= nyt_antal:
nyt_antal -= order.antal
new_thread = mass_mail_sender(order, order.ordre.bruger.email, f'Din bog {order.produkt.titel} er klar til afhentning')
threads.append(new_thread)
order.produkt_status = 'Klar til afhentning'
db.session.commit()
for thread in threads:
try:
thread.join()
except Exception:
pass
end = time.time()
print(end - start)
return 'Emails sendt'
return ''
AWS lambda functions designed to run functions within these constraints:
Memory– The amount of memory available to the function during execution. Choose an amount between 128 MB and 3,008 MB in 64-MB increments.
Lambda allocates CPU power linearly in proportion to the amount of memory configured. At 1,792 MB, a function has the equivalent of one full vCPU (one vCPU-second of credits per second).
Timeout – The amount of time that Lambda allows a function to run before stopping it. The default is 3 seconds. The maximum allowed value is 900 seconds.
To put this in your mail sending multi threaded python code. The function will terminate automatically either when your function execution completes successfully or it reaches configured timeout.
I understand you want single python function to send n number of emails "concurrently". To achieve this with lambda try the "Concurrency" setting and trigger your lambda function through a local script, S3 hosted html/js triggered by cloud watch or EC2 instance.
Concurrency – Reserve concurrency for a function to set the maximum number of simultaneous executions for a function. Provision concurrency to ensure that a function can scale without fluctuations in latency.
https://docs.aws.amazon.com/lambda/latest/dg/configuration-console.html
Important: All above settings will affect your lambda function execution cost significantly. So plan and compare before applying.
If you need any more help, let me know.
Thank you.
I'm kinda new to celery as a whole and having the problem with retry case on a for loop:
i have the following task:
#app.task(bind=True, autoretry_for=(CustomException,), retry_kwargs={'max_retries': 10,'countdown': 30})
def call_to_apis(self):
api_list = [api1, api2, api3, api4, api5,...]
for api in api_list:
try:
response = requests.get(api)
if response.status_code == 500:
raise CustomException
except CustomException:
continue
From my understanding celery will retry on my CustomException get raised.
In the case of retry, will it only retry for the the failed api or will it just run the whole process of every api in the api_list again? If so is there anyway for it to only retry the failed api ?
Expected result: only retry the failed api
EDIT:
i have split it into 2 different tasks and 1 request function as follow:
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(300.0, call_to_apis.s())
print("setup_periodic_tasks")
def call_api(api):
response = requests.get(api)
if response.status_code == 500:
raise CustomException
elif response.status_code == 404:
raise CustomWrongLinkException
#app.task(default_retry_delay=30, max_retries=10)
def send_fail_api(api):
try:
call_api(api)
except NonceTooLowException:
try:
send_fail_api.retry()
except MaxRetriesExceededError:
print("reached max retry number")
pass
except Exception:
pass
#app.task()
def call_to_apis():
api_list = [api1, api2, api3, api4, api5,...]
for api in api_list:
try:
call_api(api)
except CustomException:
send_fail_api.delay(api)
except CustomWrongLinkException:
print("wrong link")
except Exception:
pass
it worked and the other apis get to complete, with failed api it supposed to call to another task and retry for 10 time each delay 30 seconds.
But i'm getting more than expected retries about 24 times(expected to only retry 10 times) and it also printed out reached max retry number at 10th retry but it still retry until 24 retries
What am i doing wrong ?
The whole task will be retried in case of a known exception (specified in autoretry_for decorator argument), see the documentation. Celery can't know by any means the state of the task when exception is raised, that's what you have to handle. I would suggest splitting the task into individual tasks (one per API) and call them separately, presumably creating some workflow.