What does [Max tasks per child setting] exactly mean in Celery?

What does [Max tasks per child setting] exactly mean in Celery? - python-3.x

The doc is:
With this option you can configure the maximum number of tasks a worker can execute before it’s replaced by a new process.
In what condition will a worker be replaced by a new process ? Does this setting make a worker, even with multi processes, can only process one task at one time?

It means that when celery has executed tasks more than the limit on one worker (the "worker" is a process if you use the default process pool), it will restart the worker automatically.
Say if you use celery for database manipulation and you forget to close the database connection, the auto restart mechanism will help you close all pending connections.

Related

How to manage Managed Executor Service

I'm using Managed Executor Service to implement a process manager which will process tasks in the background upon receiving an JMS message event. Normally, there will be a small number of tasks running (maybe 10 max) but what if something happens and my application starts getting hundred of JMS message events. How do I handle such event?
My thought is to limit the number of threads if possible and save all the other messages to database and will be run when thread available. Thanks in advance.

My thought is to limit the number of threads if possible and save all the other messages to database and will be run when thread available.
The detailed answer to this question depends on which Java EE app server you choose to run on, since they all have slightly different configuration.
Any Java EE app server will allow you to configure the thread pool size of your Managed Executor Service (MES), this is the number of worker threads for your thread pool.
Say you have a 10 worker threads, and you get flooded with 100 requests all at once, the MES will keep a queue of requests that are backlogged, and the worker threads will take work off the queue whenever they finish work until the queue is empty.
Now, it's fine if work goes to the queue sometimes but if overall your work queue increases more quickly than your worker threads can take work off the queue, you will run into problems. The solution to this is to increase your thread pool size otherwise the backlog will get overrun and your server will run out of memory.
what if something happens and my application starts getting hundred of JMS message events. How do I handle such event?
If the load on your server will be so sporadic that tasks need to be saved to a database, it seems that the best approach would be to either:
increase thread pool size
have the server immediately reject incoming tasks when the task backlog queue is full
have clients do a blocking wait for the server task queue to be not full (I would only advise this option if client task submission is in no way connected to user experience)

How can I prevent similar queues from running at the same time?

We currently process a set of tasks using Queue workers in Laravel. When I am using multiple threads of php artisan queue:work jobs end up running together (async). We are using Beanstalkd as the queue driver.
The issue is that in the queue work we are polling an API that only allows one concurrent session for a particular agent_id. That is, only one API call with the same agent_id can run at a time.
We thought of spinning up multiple php artisan queue:work threads with a filter on the queue_name matching the agent_id but we have over 500 agents therefore we would need 500 threads so this is not ideal.
Is there anyway to implement a lock style feature for each agent_id so that if a job is already running for a particular agent_id it will send it back to the queue? Or are there any features of beanstalkd that would allow for this?
The other option could also be to gracefully handle the rejection from the API when the user is already logged in (and send the job back to the queue). But this could get messy and could clutter the logs.

You could either run only a single worker that is capable of running the fetch-from-API job, or use some sort of external marshalling/lock service.
The options for that, may be either an internal rate limiting system, or some kind of common atomically locking system. A memcached or redis server where a worker tries to set a lock-key, and only the agent that successfully sets it, gets to work on the task. An advantage of that may be that as soon as the API request has been completed, you can remove the lock, and then while the worker processes the results, a different worker can make a new request.

Fork NodeJS clusters as working load changes

I am trying to fork worker clusters to a maximun of 10, and only if the working load increases. Can it be done?
I have tried with strong-cluster-control's setSize, but I can't find an easy way of forking automatically (if many requests are being done then fork, for example), or closing/"suiciding" forks (maybe with a timeOut if nothing is being done, like in this answer)
This is my repo's main file at GitHub
Thank you in advance!!

I assume that you already have some idea as to how you would like to spread your load so I will not include details about that and instead focus on the interprocess communication required for this.
Notifying the master
To send arbitrary data to the master, you can use process.send() from a worker. The way I would go about this is probably something along these steps:
The application is started
Minimum amount of workers are spawned
Each worker will send the master a request message every time it receives a new request, via process.send()
The master keeps track of all the request events from all workers
If the amount of request events increases above a predefined threshold (i.e. > 100 requests/s) it spawns a new worker
If the amount of request events decreases below a predefined threshold it asks one of the workers to stop processing new requests and close itself gracefully (note that it should not simply kill the process to avoid interrupting ongoing requests)
Main point is: Do not focus on time - focus on rate. In an application that is supposed to handle tens to thousands of requests per second, your setTimout() (the task of which might be to kill the worker if it has been idle for too long) will never fire because Node.js evenly distributes your load across your workers - you could start with one worker, but once you reach your maximum you will never drop to one worker again under continuous load even if there is only one request per second.
It should be noted that it is counterproductive to spawn more workers than the amount of CPU cores you have at your disposal. It might, however, be beneficial to start with a single worker and incrementally increase the amount to all cores as load increases.

How can I suspend multiple threads so that they do not start to process any job and I can restart my windows service?

I am new to multithreading. In a windows service I am using SemaphoreSlim class to initiate a Thread Governer. The SemaphoreSlim constructor takes in two arguments - the thread pool size and the maximum thread pool size. I set this as following -
int poolSize = 2;
SemaphoreSlim threadGoverner = new SemaphoreSlim(poolSize, poolSize);
So the threadGoverner is initialized when the WindowsService is started, that is, the OnStart event. Now these two threads are being used to process some kind of jobs. I have a requirement where I need to change the pool size dynamically. So if the pool size is changed to 3, the service should start processing 3 requests.
So, what I am trying to do is to restart the service using a batch command file. But the problem I am facing is if there are more than 1 threads running, another continues to process these jobs. And that is causing my service to behave abnormally.
What I want is that if I detect a change in the pool size, no thread should start to process a job, so that I can restart the service without any anomaly.
Can anyone help me with this?

is PYTHON Gearman Worker accept multi-tasks

For example:
I have a task named "URLDownload", the task's function is download a large file from internet.
Now I have a Worker Process running, but have about 1000 files to download.
It is easy for a Client Process to create 1000 task, and send them to Gearman Server.
My Question is the Worker Process will do the task one by one, or it will accept multi-tasks at one time,
If the Worker Process can accept multi-tasks, So How can I limit the task-pool-size in Worker Process.

Workers process one request at a time. You have a few options:
1) You can run multiple workers (this is the most common method). Workers sit in poll() when they aren't processing so this model works pretty well.
2) Write a fork() implementation around the worker. This way you can fire up a set number of worker processes, but don't have to monitor multiple processes.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

What does [Max tasks per child setting] exactly mean in Celery? - python-3.x

The doc is: With this option you can configure the maximum number of tasks a worker can execute before it’s replaced by a new process. In what condition will a worker be replaced by a new process ? Does this setting make a worker, even with multi processes, can only process one task at one time?

Related

How to manage Managed Executor Service

How can I prevent similar queues from running at the same time?

Fork NodeJS clusters as working load changes

How can I suspend multiple threads so that they do not start to process any job and I can restart my windows service?

is PYTHON Gearman Worker accept multi-tasks

Categories

Resources