I am trying to assign a priority level to both of my queues in RabbitMQ so that my workers will always consume and clear out all messages from Queue1 first before consuming from Queue2. I use a celery configuration file, called celeryconfig.py, that looks like this:
import ssl
broker_url="amqps://USR:PWD#URL//"
result_backend="db+postgresql://USR:PWD#BURL?sslmode=verify-full&sslrootcert=/usr/local/share/ca-certificates/MY_CACERT.crt"
include=["my_tasks"]
task_acks_late=True
task_default_rate_limit="150/m"
task_time_limit=300
worker_prefetch_multiplier=1
worker_max_tasks_per_child=2
timezone="UTC"
broker_use_ssl = {'keyfile': '/usr/local/share/private/MY_KEY.key', 'certfile': '/usr/local/share/ca-certificates/MY_CERT.crt', 'ca_certs': '/usr/local/share/ca-certificates/MY_CACERT.crt', 'cert_reqs': ssl.CERT_REQUIRED, 'ssl_version': ssl.PROTOCOL_TLSv1_2}
Currently I only have 1 queue and this is how I am starting the celery workers
celery -A celery_app worker -l info --config celeryconfig --concurrency=16 -n "%h:celery-worker" -O fair
I have read the short doc here https://docs.celeryproject.org/en/v4.3.0/userguide/routing.html#routing-options-rabbitmq-priorities but it only mentions setting the max priority level and does not tell me how to set priority levels for each individual queue in RabbitMQ.
RabbitMQ: 3.7.17
Celery: 4.3.0
Python: 3.6.7
OS: Ubuntu 18.04.3 LTS bionic
Can someone shed some light on this? Thank you
I am not familiar with celery at all, but other systems can run separate workers depending on queue or other filter. And each worker can have each own config for messages per second consumed, concurrency etc..
You can create two celery configs, one with e.g. 10 priority and the other with 5 priority, and run two "instances" of celery.
This will work much better... per message priority in same worker does not work so well.
Related
Currently I am using queue to specify the worker for the celery task as follow:
celery.signature("cropper.run", args=[str("/ngpointdata/NG94_118.ngt"), int(1), str("NGdata_V4_ML_Test"), True], priority="3", queue="crop").delay()
But due to the needs of the pipeline I am working on, I have multiple workers with the same queue name, so I wanted to know if it is possible to send the task to a specific worker that has the same queue name as others but different node name?
(I'm learning Cloud Run acknowledge this is not development or code related, but hoping some GCP engineer can clarify this)
I have a PY application running - gunicorn + Flask... just PoC for now, that's why minimal configurations.
cloud run deploy has following flags:
--max-instances 1
--concurrency 5
--memory 128Mi
--platform managed
guniccorn_cfg.py files has following configurations:
workers=1
worker_class="gthread"
threads=3
I'd like to know:
1) max-instances :: if I were to adjust this, does that mean a new physical server machine is provisioned whenever needed ? Or, does the service achieve that by pulling a container image and simply starting a new container instance (docker run ...) on same physical server machine, effectively sharing the same physical machine as other container instances?
2) concurrency :: does one running container instance receive multiple concurrent requests (5 concurrent requests processed by 3 running container instances for ex.)? or does each concurrent request triggers starting new container instance (docker run ...)
3) lastly, can I effectively reach concurrency > 5 by adjusting gunicorn thread settings ? for ex. 5x3=15 in this case.. for ex. 15 concurrent requests being served by 3 running container instances for ex.? if that's true any pros/cons adjusting thread vs adjusting cloud run concurrency?
additional info:
- It's an IO intensive application (not the CPU intensive). Simply grabbing the HTTP request and publishing to pubsub/sub
thanks a lot
First of all, it's not appropriate on Stackoverflow to ask "cocktail questions" where you ask 5 things at a time. Please limit to 1 question at a time in the future.
You're not supposed to worry about where containers run (physical machines, VMs, ...). --max-instances limit the "number of container instances" that you allow your app to scale. This is to prevent ending up with a huge bill if someone was maliciously sending too many requests to your app.
This is documented at https://cloud.google.com/run/docs/about-concurrency. If you specify --concurrency=10, your container can be routed to have at most 10 in-flight requests at a time. So make sure your app can handle 10 requests at a time.
Yes, read Gunicorn documentation. Test if your setting "locally" lets gunicorn handle 5 requests at the same time... Cloud Run’s --concurrency setting is to ensure you don't get more than 5 requests to 1 container instance at any moment.
I also recommend you to read the officail docs more thoroughly before asking, and perhaps also the cloud-run-faq once which pretty much answers all these.
I was going through the celery documentations and I ran across this
Warning Backends use resources to store and transmit results. To
ensure that resources are released, you must eventually call get() or
forget() on EVERY AsyncResult instance returned after calling a task.
My celery application is indeed using the backend to store task results in the public.celery_taskmeta table in Postgres and I am sure that this warning is relevant to me. I currently have a producer that queues up a bunch of tasks for my workers every X minutes and moves on and performs other stuff. The producer is a long-running script that will eventually queue up a bunch of new tasks in RabbitMQ. The workers will usually take 5-20 minutes to finish executing a task because it pulls data from Kafka, Postgres/MySQL, processes those data and inserts them into Redshift. So for example, this is what my producer is doing
import celery_workers
for task in task_list: # task_list can hold up to 100s of tasks
async_result = celery_workers.delay(task)
# move on and do other stuff
Now my question is: how do I go back and release the backend resources by calling async_result.get() (as stated in the celery docs warning) without having my producer pause/wait for the workers to finish?
Using Python 3.6 and celery 4.3.0
The doc is:
With this option you can configure the maximum number of tasks a worker can execute before it’s replaced by a new process.
In what condition will a worker be replaced by a new process ? Does this setting make a worker, even with multi processes, can only process one task at one time?
It means that when celery has executed tasks more than the limit on one worker (the "worker" is a process if you use the default process pool), it will restart the worker automatically.
Say if you use celery for database manipulation and you forget to close the database connection, the auto restart mechanism will help you close all pending connections.
I am processing 1 lakh urls using perl gearman client and worker .
I need your help to run single job in multiple worker . (ie if i have 5 workers and 1 client i want all these 5 workers to do the job of one client ),currently I am running 20 clients and 30 workers but only 20 workers are running the job balance 10 workers are idle.
Thanks in advance
Gearman worker grabs one job, and takes it as an execution unit. If you would like to run one job on multiple workers, you should probably divide your job into several sub-jobs.
You can create a manager which manages jobs and coordinates other workers.
The approach you need is called fan-out, but gearman cant do this. you have to use rabbitmq like message queue. it can send a same message to different workers with fanout exchanges.